Automated, generalizable policy augmentation for partner diversity in multiagent RL
Develop an automated and generalizable procedure for generating a diverse set of partner policies to use during training in cooperative multiagent reinforcement learning, analogous to data augmentation in supervised learning, so that agents experience a wide variety of partner behaviors and thereby generalize to unseen partners.
References
Similar to data augmentation used in supervised training we would need to ``augment'' our policies in various ways to produce the widest variety of training partners. Unfortunately it is not clear how to achieve this in an automated and generalizable way.
— Do deep reinforcement learning agents model intentions?
(1805.06020 - Matiisen et al., 2018) in Discussion — Generalization in multiagent setups