POAR: Efficient Policy Optimization via Online Abstract State Representation Learning (2109.08642v2)
Abstract: While the rapid progress of deep learning fuels end-to-end reinforcement learning (RL), direct application, especially in high-dimensional space like robotic scenarios still suffers from low sample efficiency. Therefore State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states. However, the pervasive implementation of SRL is usually conducted by a decoupling strategy in which the observation-state mapping is learned separately, which is prone to over-fit. To handle such problem, we summarize the state-of-the-art (SOTA) SRL sub-tasks in previous works and present a new algorithm called Policy Optimization via Abstract Representation which integrates SRL into the policy optimization phase. Firstly, We engage RL loss to assist in updating SRL model so that the states can evolve to meet the demand of RL and maintain a good physical interpretation. Secondly, we introduce a dynamic loss weighting mechanism so that both models can efficiently adapt to each other. Thirdly, we introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations. Finally, we provide a real-time access of state graph to monitor the course of learning. Experiments indicate that POAR significantly outperforms SOTA RL algorithms and decoupling SRL strategies in terms of sample efficiency and final rewards. We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
- Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Campbell, R.H., Czechowski, K., Erhan, D., Finn, C., Kozakowski, P., Levine, S., et al.: Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374 (2019) Finn et al. [2016] Finn, C., Levine, S., Abbeel, P.: Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp. 49–58 (2016). PMLR Chen et al. [2021] Chen, Z., Chen, B., Xie, S., Gong, L., Liu, C., Zhang, Z., Zhang, J.: Efficiently training on-policy actor-critic networks in robotic deep reinforcement learning with demonstration-like sampled exploration. In: 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), pp. 292–298 (2021). IEEE Garcıa and Fernández [2015] Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Levine, S., Abbeel, P.: Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp. 49–58 (2016). PMLR Chen et al. [2021] Chen, Z., Chen, B., Xie, S., Gong, L., Liu, C., Zhang, Z., Zhang, J.: Efficiently training on-policy actor-critic networks in robotic deep reinforcement learning with demonstration-like sampled exploration. In: 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), pp. 292–298 (2021). IEEE Garcıa and Fernández [2015] Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, Z., Chen, B., Xie, S., Gong, L., Liu, C., Zhang, Z., Zhang, J.: Efficiently training on-policy actor-critic networks in robotic deep reinforcement learning with demonstration-like sampled exploration. In: 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), pp. 292–298 (2021). IEEE Garcıa and Fernández [2015] Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Finn, C., Levine, S., Abbeel, P.: Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp. 49–58 (2016). PMLR Chen et al. [2021] Chen, Z., Chen, B., Xie, S., Gong, L., Liu, C., Zhang, Z., Zhang, J.: Efficiently training on-policy actor-critic networks in robotic deep reinforcement learning with demonstration-like sampled exploration. In: 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), pp. 292–298 (2021). IEEE Garcıa and Fernández [2015] Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, Z., Chen, B., Xie, S., Gong, L., Liu, C., Zhang, Z., Zhang, J.: Efficiently training on-policy actor-critic networks in robotic deep reinforcement learning with demonstration-like sampled exploration. In: 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), pp. 292–298 (2021). IEEE Garcıa and Fernández [2015] Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Chen, Z., Chen, B., Xie, S., Gong, L., Liu, C., Zhang, Z., Zhang, J.: Efficiently training on-policy actor-critic networks in robotic deep reinforcement learning with demonstration-like sampled exploration. In: 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), pp. 292–298 (2021). IEEE Garcıa and Fernández [2015] Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research 16(1), 1437–1480 (2015) Chen et al. [2023] Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Chen, Z., Chen, B., He, T., Gong, L., Liu, C.: Progressive adaptive chance-constrained safeguards for reinforcement learning. arXiv preprint arXiv:2310.03379 (2023) Bellman [1966] Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Bellman, R.: Dynamic programming. Science 153(3731), 34–37 (1966) Huang et al. [2020] Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Huang, W., Mordatch, I., Pathak, D.: One policy to control them all: Shared modular policies for agent-agnostic control. In: International Conference on Machine Learning, pp. 4455–4464 (2020). PMLR Bengio et al. [2013] Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence 35(8), 1798–1828 (2013) Chen et al. [2020] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020). PMLR He et al. [2020] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020) Bachman et al. [2019] Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. Advances in neural information processing systems 32 (2019) Schwarzer et al. [2020] Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Schwarzer, M., Anand, A., Goel, R., Hjelm, R.D., Courville, A., Bachman, P.: Data-efficient reinforcement learning with self-predictive representations. arXiv preprint arXiv:2007.05929 (2020) Raffin et al. [2019] Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Raffin, A., Hill, A., Traoré, R., Lesort, T., Díaz-Rodríguez, N., Filliat, D.: Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. arXiv preprint arXiv:1901.08651 (2019) Zakka et al. [2022] Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., Dwibedi, D.: Xirl: Cross-embodiment inverse reinforcement learning. In: Conference on Robot Learning, pp. 537–546 (2022). PMLR Liu et al. [2018] Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Liu, Y., Gupta, A., Abbeel, P., Levine, S.: Imitation from observation: Learning to imitate behaviors from raw video via context translation. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1118–1125 (2018). IEEE Yu et al. [2018] Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Yu, T., Finn, C., Xie, A., Dasari, S., Zhang, T., Abbeel, P., Levine, S.: One-shot imitation from observing humans via domain-adaptive meta-learning. arXiv preprint arXiv:1802.01557 (2018) Ding et al. [2023] Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Ding, M., Xu, Y., Chen, Z., Cox, D.D., Luo, P., Tenenbaum, J.B., Gan, C.: Embodied concept learner: Self-supervised learning of concepts and mapping through instruction following. In: Conference on Robot Learning, pp. 1743–1754 (2023). PMLR Martín-Martín et al. [2019] Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Martín-Martín, R., Lee, M.A., Gardner, R., Savarese, S., Bohg, J., Garg, A.: Variable impedance control in end-effector space: An action space for reinforcement learning in contact-rich tasks. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1010–1017 (2019). IEEE Shao et al. [2020] Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Shao, L., Ferreira, F., Jorda, M., Nambiar, V., Luo, J., Solowjow, E., Ojea, J.A., Khatib, O., Bohg, J.: Unigrasp: Learning a unified model to grasp with multifingered robotic hands. IEEE Robotics and Automation Letters 5(2), 2286–2293 (2020) Schwarzer et al. [2021] Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, R.D., Bachman, P., Courville, A.C.: Pretraining representations for data-efficient reinforcement learning. Advances in Neural Information Processing Systems 34, 12686–12699 (2021) Grill et al. [2020] Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems 33, 21271–21284 (2020) Jonschkowski and Brock [2014] Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Jonschkowski, R., Brock, O.: State representation learning in robotics: Using prior knowledge about physical interaction. In: Robotics: Science and Systems (2014) Lange et al. [2012] Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Lange, S., Riedmiller, M., Voigtländer, A.: Autonomous reinforcement learning on raw visual input data in a real world application. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2012). IEEE Jetchev et al. [2013] Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Jetchev, N., Lang, T., Toussaint, M.: Learning grounded relational symbols from continuous data for abstract reasoning. In: Proceedings of the 2013 ICRA Workshop on Autonomous Learning (2013) Gelada et al. [2019] Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Gelada, C., Kumar, S., Buckman, J., Nachum, O., Bellemare, M.G.: Deepmdp: Learning continuous latent space models for representation learning. In: International Conference on Machine Learning, pp. 2170–2179 (2019). PMLR Guo et al. [2020] Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Guo, Z.D., Pires, B.A., Piot, B., Grill, J.-B., Altché, F., Munos, R., Azar, M.G.: Bootstrap latent-predictive representations for multitask reinforcement learning. In: International Conference on Machine Learning, pp. 3875–3886 (2020). PMLR Finn et al. [2015] Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Finn, C., Tan, X.Y., Duan, Y., Darrell, T., Levine, S., Abbeel, P.: Learning visual feature spaces for robotic manipulation with deep spatial autoencoders. arXiv preprint arXiv:1509.06113 25, 2 (2015) Lesort et al. [2018] Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Lesort, T., Díaz-Rodríguez, N., Goudou, J.-F., Filliat, D.: State representation learning for control: An overview. Neural Networks 108, 379–392 (2018) Marler and Arora [2004] Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization 26, 369–395 (2004) Sun et al. [2021] Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Sun, T., Gong, L., Li, X., Xie, S., Chen, Z., Hu, Q., Filliat, D.: Robotdrlsim: A real time robot simulation platform for reinforcement learning and human interactive demonstration learning. In: Journal of Physics: Conference Series, vol. 1746, p. 012035 (2021). IOP Publishing Argall et al. [2009] Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and autonomous systems 57(5), 469–483 (2009) Gretton et al. [2012] Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. The Journal of Machine Learning Research 13(1), 723–773 (2012) Munk et al. [2016] Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Munk, J., Kober, J., Babuška, R.: Learning state representation for deep actor-critic control. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4667–4673 (2016). IEEE Hjelm et al. [2018] Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018) He et al. [2022] He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- He, T., Zhang, Y., Ren, K., Liu, M., Wang, C., Zhang, W., Yang, Y., Li, D.: Reinforcement learning with automated auxiliary loss search. Advances in Neural Information Processing Systems 35, 1820–1834 (2022) Kingma and Ba [2014] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) Xie et al. [2023] Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Xie, S., Gong, L., Chen, Z., Chen, B.: Simulation of real-time collision-free path planning method with deep policy network in human-robot interaction scenario. In: 2023 International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 360–365 (2023). IEEE Kober et al. [2013] Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013) Berlinet and Thomas-Agnan [2011] Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Berlinet, A., Thomas-Agnan, C.: Reproducing Kernel Hilbert Spaces in Probability and Statistics. Springer, ??? (2011) Todorov et al. [2012] Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Todorov, E., Erez, T., Tassa, Y.: Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012). IEEE Graves and Graves [2012] Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Graves, A., Graves, A.: Long short-term memory. Supervised sequence labelling with recurrent neural networks, 37–45 (2012) Dhariwal et al. [2017] Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017) Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P.: Openai baselines (2017)
- Zhaorun Chen (28 papers)
- Siqi Fan (31 papers)
- Yuan Tan (9 papers)
- Liang Gong (8 papers)
- Binhao Chen (5 papers)
- Te Sun (4 papers)
- David Filliat (37 papers)
- Natalia Díaz-Rodríguez (34 papers)
- Chengliang Liu (13 papers)