Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Utility-based Adaptive Teaching Strategies using Bayesian Theory of Mind (2309.17275v1)

Published 29 Sep 2023 in cs.LG

Abstract: Good teachers always tailor their explanations to the learners. Cognitive scientists model this process under the rationality principle: teachers try to maximise the learner's utility while minimising teaching costs. To this end, human teachers seem to build mental models of the learner's internal state, a capacity known as Theory of Mind (ToM). Inspired by cognitive science, we build on Bayesian ToM mechanisms to design teacher agents that, like humans, tailor their teaching strategies to the learners. Our ToM-equipped teachers construct models of learners' internal states from observations and leverage them to select demonstrations that maximise the learners' rewards while minimising teaching costs. Our experiments in simulated environments demonstrate that learners taught this way are more efficient than those taught in a learner-agnostic way. This effect gets stronger when the teacher's model of the learner better aligns with the actual learner's state, either using a more accurate prior or after accumulating observations of the learner's behaviour. This work is a first step towards social machines that teach us and each other, see https://teacher-with-tom.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Bayesian theory of mind: Modeling joint belief-desire attribution. In Proceedings of the Thirty-Third Annual Conference of the Cognitive Science Society, 2011.
  2. Action understanding as inverse planning. Cognition, 113(3):329–349, December 2009. doi: 10.1016/j.cognition.2009.07.005. URL https://www.sciencedirect.com/science/article/pii/S0010027709002022.
  3. Machine teaching for inverse reinforcement learning: Algorithms and applications, 2019.
  4. Pragmatically learning from pedagogical demonstrations in multi-goal environments, 2022.
  5. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks, 2023.
  6. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. PhD thesis, Univ of Massachusetts at Amherst, 2002.
  7. Pragmatic language interpretation as probabilistic inference. Trends in Cognitive Sciences, 20(11):818–829, 2016. ISSN 1364-6613. doi: https://doi.org/10.1016/j.tics.2016.08.005. URL https://www.sciencedirect.com/science/article/pii/S136466131630122X.
  8. Hyowon Gweon. Inferential social learning: cognitive foundations of human social learning and teaching. Trends in Cognitive Sciences, 25(10):896–910, 2021. ISSN 1364-6613. doi: https://doi.org/10.1016/j.tics.2021.07.008. URL https://www.sciencedirect.com/science/article/pii/S1364661321001789.
  9. Development of children’s sensitivity to overinformativeness in learning and teaching. Dev Psychol, 54(11):2113–2125, 2018. doi: 10.1037/dev0000580. Epub 2018 Sep 27.
  10. Socially intelligent machines that learn from humans and help humans learn. Philosophical Transactions of the Royal Society A, 381(2200):48202–48202, 2023. doi: 10.xxxxx/xxxxxx.
  11. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968.
  12. Communication in action: Planning and interpreting communicative demonstrations. Journal of Experimental Psychology: General, 150(11):2246, 2021.
  13. Planning with theory of mind. Trends in Cognitive Sciences, 26(11):959–971, 2022. ISSN 1364-6613. doi: https://doi.org/10.1016/j.tics.2022.08.003. URL https://www.sciencedirect.com/science/article/pii/S1364661322001851.
  14. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1):99–134, 1998. ISSN 0004-3702. doi: https://doi.org/10.1016/S0004-3702(98)00023-X. URL https://www.sciencedirect.com/science/article/pii/S000437029800023X.
  15. Learning theory of mind via dynamic traits attribution, 2022.
  16. Machine theory of mind, 2018.
  17. Where do you think you’re going?: Inferring beliefs about dynamics from behavior, 2019.
  18. Bayes-adaptive POMDPs. In J. Platt, D. Koller, Y. Singer, and S. Roweis (eds.), Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007. URL https://proceedings.neurips.cc/paper_files/paper/2007/file/3b3dbaf68507998acd6a5a5254ab2d76-Paper.pdf.
  19. A rational account of pedagogical reasoning: Teaching by, and learning from, examples. Cognitive psychology, 71:55–89, 2014.
  20. Towards teachable autotelic agents. IEEE Transactions on Cognitive and Developmental Systems, pp.  1–1, 2022. doi: 10.1109/TCDS.2022.3231731.
  21. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research, 64:645–703, 03 2019. doi: 10.1613/jair.1.11396.
  22. Scott Cheng-Hsin Yang and Patrick Shafto. Explainable artificial intelligence via bayesian teaching. In Neural Information Processing Systems, 2017.
  23. Inferring the goals of communicating agents from actions and instructions, 2023.
  24. Define, evaluate, and improve task-oriented cognitive capabilities for instruction generation models. In First Workshop on Theory of Mind in Communicating Agents, 2023. URL https://openreview.net/forum?id=KnmXVvARvZ.
  25. Online bayesian goal inference for boundedly-rational planning agents, 2020.
  26. I cast detect thoughts: Learning to converse and guide with intents and theory-of-mind in dungeons and dragons, 2023.
  27. Xiaojin Zhu. Machine teaching for bayesian learners in the exponential family, 2013.
  28. An overview of machine teaching, 2018.
  29. Varibad: A very good method for bayes-adaptive deep rl via meta-learning, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.