Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Transductive Reward Inference on Graph (2402.03661v1)

Published 6 Feb 2024 in cs.LG and cs.AI

Abstract: In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on developing a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. S. Lange, T. Gabel, and M. Riedmiller, “Batch reinforcement learning,” in Reinforcement learning.   Springer, 2012, pp. 45–73.
  2. S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
  3. R. F. Prudencio, M. R. Maximo, and E. L. Colombini, “A survey on offline reinforcement learning: Taxonomy, review, and open problems,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  4. S. Cabi, S. G. Colmenarejo, A. Novikov, K. Konyushkova, S. Reed, R. Jeong, K. Zolna, Y. Aytar, D. Budden, M. Vecerik et al., “Scaling data-driven robotics with reward sketching and batch reinforcement learning,” arXiv preprint arXiv:1909.12200, 2019.
  5. S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn, “Robonet: Large-scale multi-robot learning,” in Conference on Robot Learning.   PMLR, 2020, pp. 885–897.
  6. F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, “Bdd100k: A diverse driving video database with scalable annotation tooling,” arXiv preprint arXiv:1805.04687, vol. 2, no. 5, p. 6, 2018.
  7. A. Strehl, J. Langford, L. Li, and S. M. Kakade, “Learning from logged implicit exploration data,” Advances in neural information processing systems, vol. 23, 2010.
  8. L. Bottou, J. Peters, J. Quiñonero-Candela, D. X. Charles, D. M. Chickering, E. Portugaly, D. Ray, P. Simard, and E. Snelson, “Counterfactual reasoning and learning systems: The example of computational advertising.” Journal of Machine Learning Research, vol. 14, no. 11, 2013.
  9. S. M. Shortreed, E. Laber, D. J. Lizotte, T. S. Stroup, J. Pineau, and S. A. Murphy, “Informing sequential clinical decision-making through reinforcement learning: an empirical study,” Machine learning, vol. 84, no. 1, pp. 109–136, 2011.
  10. S. Cabi, S. G. Colmenarejo, A. Novikov, K. Konyushkova, S. Reed, R. Jeong, K. Zolna, Y. Aytar, D. Budden, M. Vecerik et al., “Scaling data-driven robotics with reward sketching and batch reinforcement learning,” In Robotics: Science and Systems Conference, 2020.
  11. K. Zolna, A. Novikov, K. Konyushkova, C. Gulcehre, Z. Wang, Y. Aytar, M. Denil, N. de Freitas, and S. Reed, “Offline learning from demonstrations and unlabeled experience,” arXiv preprint arXiv:2011.13885, 2020.
  12. K. Konyushkova, K. Zolna, Y. Aytar, A. Novikov, S. Reed, S. Cabi, and N. de Freitas, “Semi-supervised reward learning for offline reinforcement learning,” arXiv preprint arXiv:2012.06899, 2020.
  13. T. Yu, A. Kumar, Y. Chebotar, K. Hausman, C. Finn, and S. Levine, “How to leverage unlabeled data in offline reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2022, pp. 25 611–25 635.
  14. X. Zhu and Z. Ghahramani, “Learning from labeled and unlabeled data with label propagation,” ProQuest number: information to all users, 2002.
  15. D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf, “Learning with local and global consistency,” Advances in neural information processing systems, vol. 16, 2003.
  16. X. Zhang and W. Lee, “Hyperparameter learning for graph based semi-supervised learning algorithms,” Advances in neural information processing systems, vol. 19, 2006.
  17. F. Wang and C. Zhang, “Label propagation through linear neighborhoods,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 985–992.
  18. M. Karasuyama and H. Mamitsuka, “Manifold-based similarity adaptation for label propagation,” Advances in neural information processing systems, vol. 26, 2013.
  19. C. Gong, D. Tao, W. Liu, L. Liu, and J. Yang, “Label propagation via teaching-to-learn and learning-to-teach,” IEEE transactions on neural networks and learning systems, vol. 28, no. 6, pp. 1452–1465, 2016.
  20. Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. J. Hwang, and Y. Yang, “Learning to propagate labels: Transductive propagation network for few-shot learning,” in International Conference on Learning Representations, 2019.
  21. Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. d. L. Casas, D. Budden, A. Abdolmaleki, J. Merel, A. Lefrancq et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690, 2018.
  22. T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, “Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning,” in Conference on robot learning.   PMLR, 2020, pp. 1094–1100.
  23. H. Wang and J. Leskovec, “Unifying graph convolutional neural networks and label propagation,” arXiv preprint arXiv:2002.06755, 2020.
  24. S. Fujimoto, D. Meger, and D. Precup, “Off-policy deep reinforcement learning without exploration,” in International conference on machine learning.   PMLR, 2019, pp. 2052–2062.
  25. Q. Wang, J. Xiong, L. Han, H. Liu, T. Zhang et al., “Exponentially weighted imitation learning for batched historical data,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  26. X. Chen, Z. Zhou, Z. Wang, C. Wang, Y. Wu, and K. Ross, “Bail: Best-action imitation learning for batch deep reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 353–18 363, 2020.
  27. N. Y. Siegel, J. T. Springenberg, F. Berkenkamp, A. Abdolmaleki, M. Neunert, T. Lampe, R. Hafner, N. Heess, and M. Riedmiller, “Keep doing what worked: Behavioral modelling priors for offline reinforcement learning,” in International Conference on Learning Representations, 2020.
  28. X. B. Peng, A. Kumar, G. Zhang, and S. Levine, “Advantage-weighted regression: Simple and scalable off-policy reinforcement learning,” arXiv preprint arXiv:1910.00177, 2019.
  29. Z. Wang, A. Novikov, K. Zolna, J. S. Merel, J. T. Springenberg, S. E. Reed, B. Shahriari, N. Siegel, C. Gulcehre, N. Heess et al., “Critic regularized regression,” Advances in Neural Information Processing Systems, vol. 33, pp. 7768–7778, 2020.
  30. I. Kostrikov, R. Fergus, J. Tompson, and O. Nachum, “Offline reinforcement learning with fisher divergence critic regularization,” in International Conference on Machine Learning.   PMLR, 2021, pp. 5774–5783.
  31. P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on Machine learning, 2004, p. 1.
  32. A. Y. Ng, S. Russell et al., “Algorithms for inverse reinforcement learning.” in Icml, vol. 1, 2000, p. 2.
  33. J. Ho and S. Ermon, “Generative adversarial imitation learning,” Advances in neural information processing systems, vol. 29, 2016.
  34. A. Edwards, C. Isbell, and A. Takanishi, “Perceptual reward functions,” arXiv preprint arXiv:1608.03824, 2016.
  35. A. Singh, L. Yang, K. Hartikainen, C. Finn, and S. Levine, “End-to-end robotic reinforcement learning without reward engineering,” arXiv preprint arXiv:1904.07854, 2019.
  36. M. Klissarov and D. Precup, “Reward propagation using graph convolutional networks,” Advances in Neural Information Processing Systems, vol. 33, pp. 12 895–12 908, 2020.
  37. N. Baram, O. Anschel, I. Caspi, and S. Mannor, “End-to-end differentiable adversarial imitation learning,” in International Conference on Machine Learning.   PMLR, 2017, pp. 390–399.
  38. C. Finn, S. Levine, and P. Abbeel, “Guided cost learning: Deep inverse optimal control via policy optimization,” in International conference on machine learning.   PMLR, 2016, pp. 49–58.
  39. J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adversarial inverse reinforcement learning,” arXiv preprint arXiv:1710.11248, 2017.
  40. Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  41. J. Merel, Y. Tassa, D. TB, S. Srinivasan, J. Lemmon, Z. Wang, G. Wayne, and N. Heess, “Learning human behaviors from motion capture by adversarial imitation,” arXiv preprint arXiv:1707.02201, 2017.
  42. P. Sermanet, K. Xu, and S. Levine, “Unsupervised perceptual rewards for imitation learning,” arXiv preprint arXiv:1612.06699, 2016.
  43. Y. Zhu, Z. Wang, J. Merel, A. Rusu, T. Erez, S. Cabi, S. Tunyasuvunakool, J. Kramár, R. Hadsell, N. de Freitas et al., “Reinforcement and imitation learning for diverse visuomotor skills,” arXiv preprint arXiv:1802.09564, 2018.
  44. K. Zolna, S. Reed, A. Novikov, S. G. Colmenarejo, D. Budden, S. Cabi, M. Denil, N. de Freitas, and Z. Wang, “Task-relevant adversarial imitation learning,” in Conference on Robot Learning.   PMLR, 2021, pp. 247–263.
  45. V. N. Vapnik, “An overview of statistical learning theory,” IEEE transactions on neural networks, vol. 10, no. 5, pp. 988–999, 1999.
  46. T. Joachims et al., “Transductive inference for text classification using support vector machines,” in Icml, vol. 99, 1999, pp. 200–209.
  47. M. Rohrbach, S. Ebert, and B. Schiele, “Transfer learning in a transductive setting,” Advances in neural information processing systems, vol. 26, 2013.
  48. Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong, “Transductive multi-view zero-shot learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 11, pp. 2332–2345, 2015.
  49. M. Belkin and P. Niyogi, “Semi-supervised learning on manifolds,” Machine Learning Journal, vol. 1, 2002.
  50. A. Blum and S. Chawla, “Learning from labeled and unlabeled data using graph mincuts,” in Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 19–26.
  51. O. Chapelle, J. Weston, and B. Schölkopf, “Cluster kernels for semi-supervised learning,” Advances in neural information processing systems, vol. 15, 2002.
  52. X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” in Proceedings of the 20th International conference on Machine learning (ICML-03), 2003, pp. 912–919.
  53. A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Label propagation for deep semi-supervised learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5070–5079.
  54. S. Tunyasuvunakool, A. Muldal, Y. Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y. Tassa, “dm_control: Software and tasks for continuous control,” Software Impacts, vol. 6, p. 100022, 2020.
  55. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in International conference on machine learning.   PMLR, 2018, pp. 1861–1870.
  56. C. Gulcehre, Z. Wang, A. Novikov, T. Le Paine, S. G. Colmenarejo, K. Zolna, R. Agarwal, J. Merel, D. Mankowitz, C. Paduraru et al., “Rl unplugged: Benchmarks for offline reinforcement learning,” arXiv preprint arXiv:2006.13888, vol. 394, 2020.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 tweet and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: