Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification (2306.10944v1)

Published 19 Jun 2023 in cs.MA

Abstract: Ad hoc teamwork requires an agent to cooperate with unknown teammates without prior coordination. Many works propose to abstract teammate instances into high-level representation of types and then pre-train the best response for each type. However, most of them do not consider the distribution of teammate instances within a type. This could expose the agent to the hidden risk of \emph{type confounding}. In the worst case, the best response for an abstract teammate type could be the worst response for all specific instances of that type. This work addresses the issue from the lens of causal inference. We first theoretically demonstrate that this phenomenon is due to the spurious correlation brought by uncontrolled teammate distribution. Then, we propose our solution, CTCAT, which disentangles such correlation through an instance-wise teammate feedback rectification. This operation reweights the interaction of teammate instances within a shared type to reduce the influence of type confounding. The effect of CTCAT is evaluated in multiple domains, including classic ad hoc teamwork tasks and real-world scenarios. Results show that CTCAT is robust to the influence of type confounding, a practical issue that directly hazards the robustness of our trained agents but was unnoticed in previous works.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. An optimistic perspective on offline reinforcement learning. In Proceedings of the International Conference on Machine Learning, pp.  104–114, 2020.
  2. A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems. In International Conference on Autonomous Agents and Multi-Agent Systems, pp.  1155–1156, 2013.
  3. Belief and truth in hypothesised behaviours. Artificial Intelligence, 235:63–94, 2016.
  4. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE Annual Foundations of Computer Science, pp.  322–331. IEEE, 1995.
  5. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002.
  6. Making friends on the fly: Cooperating with new teammates. Artificial Intelligence, 242:132–171, 2017.
  7. Coordination and adaptation in impromptu teams. In Proceedings of the National Conference on Artificial Intelligence, pp.  53–58, 2005.
  8. Diverse agents for ad-hoc cooperation in Hanabi. In IEEE Conference on Games, pp.  1–8, 2019.
  9. Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. British Medical Journal (Clinical Research Ed.), 292(6524):879–882, 1986.
  10. AATEAM: achieving the ad hoc teamwork by employing the attention mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  7095–7102, 2020.
  11. First return, then explore. Nature, 590(7847):580–586, 2021.
  12. Learning to reach goals via iterated supervised learning. In International Conference on Learning Representations, 2021.
  13. Online ad hoc teamwork under partial observability. In International Conference on Learning Representations, 2021.
  14. Learning pseudometric-based action representations for offline reinforcement learning. In International Conference on Machine Learning, pp. 7902–7918, 2022.
  15. Harsanyi, J. C. Games with incomplete information played by “Bayesian” players. Management Science, 14(3):159–182, 1967.
  16. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968.
  17. Half field offense: An environment for multiagent learning and ad hoc teamwork. In Adaptive Learning Agents Workshop at AAMAS, 2016.
  18. Deep recurrent Q-learning for partially observable MDPs. In AAAI Fall Symposia, pp.  29–37, 2015.
  19. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
  20. Ad hoc teamwork by learning teammates’ task. Autonomous Agents and Multi-Agent Systems, 30(2):175–219, 2016.
  21. A penny for your thoughts: The value of communication in ad hoc teamwork. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp.  254–260, 2020.
  22. A survey of ad hoc teamwork research. In Proceedings of the Multi-Agent Systems - European Conference, pp.  275–293, 2022.
  23. Agent modelling under partial observability for deep reinforcement learning. In Advances in Neural Information Processing Systems, pp. 19210–19222, 2021.
  24. Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, 2009.
  25. Towards open ad hoc teamwork using graph-based policy learning. In Proceedings of the International Conference on Machine Learning, pp.  8776–8786, 2021.
  26. Ad hoc teamwork with behavior switching agents. In Proceedings of the International Joint Conference on Artificial Intelligence, pp.  550–556, 2019.
  27. Towards social complexity reduction in multiagent learning: the adhoc approach. In Proceedings of the AAAI Spring Symposium on Collaborative Learning Agents, pp.  90–97, 2002.
  28. Rubin, D. B. Bayesian inference for causal effects: The role of randomization. The Annals of statistics, pp.  34–58, 1978.
  29. Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the AAAI Conference on Artificial Intelligence, 2010.
  30. Reasoning about human behavior in ad hoc teamwork. In Adaptive Learning Agents Workshop at AAMAS, 2021.
  31. Reinforcement learning: An introduction. MIT press, 2018.
  32. Thompson, W. R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
  33. Wagner, C. H. Simpson’s paradox in real life. The American Statistician, 36(1):46–48, 1982.
  34. Collaborative visual navigation. CoRR, abs/2107.01151, 2021.
  35. Learning with generated teammates to achieve type-free ad-hoc teamwork. In Proceedings of the International Joint Conference on Artificial Intelligence, pp.  472–478, 2021.
  36. TinyLight: Adaptive traffic signal control on devices with extremely limited resources. In Proceedings of the International Joint Conference on Artificial Intelligence, pp.  3999–4005, 2022.
  37. A unified approach for multi-step temporal-difference learning with eligibility traces in reinforcement learning. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp.  2984–2990, 2018.
  38. Sample complexity of policy gradient finding second-order stationary points. In Thirty-Fifth AAAI Conference on Artificial Intelligence, pp.  10630–10638, 2021.
  39. Policy optimization with stochastic mirror descent. In Thirty-Sixth AAAI Conference on Artificial Intelligence, pp.  8823–8831, 2022.
  40. Deep interactive Bayesian reinforcement learning via meta-learning. In International Conference on Autonomous Agents and Multiagent Systems, pp.  1712–1714, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.