ETHER: Aligning Emergent Communication for Hindsight Experience Replay (2307.15494v2)
Abstract: Natural language instruction following is paramount to enable collaboration between artificial agents and human beings. Natural language-conditioned reinforcement learning (RL) agents have shown how natural languages' properties, such as compositionality, can provide a strong inductive bias to learn complex policies. Previous architectures like HIGhER combine the benefit of language-conditioning with Hindsight Experience Replay (HER) to deal with sparse rewards environments. Yet, like HER, HIGhER relies on an oracle predicate function to provide a feedback signal highlighting which linguistic description is valid for which state. This reliance on an oracle limits its application. Additionally, HIGhER only leverages the linguistic information contained in successful RL trajectories, thus hurting its final performance and data-efficiency. Without early successful trajectories, HIGhER is no better than DQN upon which it is built. In this paper, we propose the Emergent Textual Hindsight Experience Replay (ETHER) agent, which builds on HIGhER and addresses both of its limitations by means of (i) a discriminative visual referential game, commonly studied in the subfield of Emergent Communication (EC), used here as an unsupervised auxiliary task and (ii) a semantic grounding scheme to align the emergent language with the natural language of the instruction-following benchmark. We show that the referential game's agents make an artificial language emerge that is aligned with the natural-like language used to describe goals in the BabyAI benchmark and that it is expressive enough so as to also describe unsuccessful RL trajectories and thus provide feedback to the RL agent to leverage the linguistic, structured information contained in all trajectories. Our work shows that EC is a viable unsupervised auxiliary task for RL and provides missing pieces to make HER more widely applicable.
- TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
- Hindsight experience replay. arXiv preprint arXiv:1707.01495, 2017.
- M. Baroni. Linguistic generalization and compositionality in modern artificial neural networks. mar 2019. URL http://arxiv.org/abs/1904.00157.
- Emergence of Communication in an Interactive World with Consistent Speakers. sep 2018. URL http://arxiv.org/abs/1809.00549.
- D. Bouchacourt and M. Baroni. How agents see things: On visual representations in an emergent language game. aug 2018. URL http://arxiv.org/abs/1808.10696.
- G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
- H. Brighton. Compositional syntax from cultural transmission. MIT Press, Artificial, 2002. URL https://www.mitpressjournals.org/doi/abs/10.1162/106454602753694756.
- Emergent quantized communication. arXiv preprint arXiv:2211.02412, 2022.
- Anti-efficient encoding in emergent communication. NeurIPS, may 2019a. URL http://arxiv.org/abs/1905.12561.
- Word-order biases in deep-agent emergent communication. may 2019b. URL http://arxiv.org/abs/1905.12330.
- Compositionality and Generalization in Emergent Languages. apr 2020. URL http://arxiv.org/abs/2004.09124.
- Isolating sources of disentanglement in VAEs. https://papers.nips.cc/paper/2018/file/1ee3dfcd8a0645a25a35977997223d22-Paper.pdf. Accessed: 2021-3-17.
- BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop. oct 2018. URL http://arxiv.org/abs/1810.08272.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- Compositional Obverter Communication Learning From Raw Visual Input. apr 2018. URL http://arxiv.org/abs/1804.02341.
- Higher: Improving instruction following with hindsight generation for experience replay. In 2020 IEEE Symposium Series on Computational Intelligence (SSCI), pages 225–232. IEEE, 2020.
- Visual referential games further the emergence of disentangled representations. arXiv preprint arXiv:2304.14511, 2023.
- K. Denamganaï and J. A. Walker. Referentialgym: A framework for language emergence & grounding in (visual) referential games. 4th NeurIPS Workshop on Emergent Communication, 2020a.
- K. Denamganaï and J. A. Walker. On (emergent) systematic generalisation and compositionality in visual referential games with straight-through gumbel-softmax estimator. 4th NeurIPS Workshop on Emergent Communication, 2020b.
- Interpretable agent communication from scratch (with a generic visual processor emerging on the side). May 2021.
- The emergence of compositional languages for numeric concepts through iterated learning in neural agents. arXiv preprint arXiv:1910.05291, 2019.
- Array programming with NumPy. Nature, 585:357–362, 2020. doi: 10.1038/s41586-020-2649-2.
- S. Havrylov and I. Titov. Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols. may 2017. URL http://arxiv.org/abs/1705.11192.
- DARLA: Improving Zero-Shot Transfer in Reinforcement Learning. URL https://arxiv.org/pdf/1707.08475.pdf.
- SCAN: Learning Abstract Hierarchical Compositional Visual Concepts. jul 2017. URL http://arxiv.org/abs/1707.03389.
- Towards a Definition of Disentangled Representations. dec 2018. URL http://arxiv.org/abs/1812.02230.
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933, 2018.
- T.-W. Huang. Tensorboardx, 2018. URL https://github.com/lanpa/tensorboardX.
- Reinforcement learning with unsupervised auxiliary tasks. In International Conference on Learning Representations, 2016.
- R. Jakobson. Linguistics and poetics. In Style in language, pages 350–377. MA: MIT Press, 1960.
- Language as an Abstraction for Hierarchical Deep Reinforcement Learning. jun 2019. URL http://arxiv.org/abs/1906.07343.
- Recurrent experience replay in distributed reinforcement learning. In International conference on learning representations, 2018.
- H. Kim and A. Mnih. Disentangling by factorising. arXiv preprint arXiv:1802.05983, 2018.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- D. P. Kingma and M. Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- S. Kirby. Learning, bottlenecks and the evolution of recursive syntax. 2002.
- Developmentally motivated emergence of compositional communication via template transfer. oct 2019. URL http://arxiv.org/abs/1910.06079.
- Natural Language Does Not Emerge ’Naturally’ in Multi-Agent Dialog. jun 2017. URL http://arxiv.org/abs/1706.08502.
- Visual Coreference Resolution in Visual Dialog using Neural Module Networks. sep 2018. URL http://arxiv.org/abs/1809.01816.
- Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input. apr 2018. URL http://arxiv.org/abs/1804.03984.
- D. Lewis. Convention: A philosophical study. 1969.
- F. Li and M. Bowling. Ease-of-Teaching and Language Structure from Emergent Communication. jun 2019. URL http://arxiv.org/abs/1906.02403.
- A sober look at the unsupervised learning of disentangled representations and their evaluation. Oct. 2020.
- A survey of reinforcement learning informed by natural language, 2019.
- Playing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013. URL http://arxiv.org/abs/1312.5602.
- The role of disentanglement in generalisation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=qbH974jKUVy.
- I. Mordatch and P. Abbeel. Emergence of Grounded Compositional Language in Multi-Agent Populations. URL https://arxiv.org/pdf/1703.04908.pdf.
- Grounded Language Learning in a Simulated 3D World. URL https://arxiv.org/pdf/1706.06551.pdf.
- T. pandas development team. pandas-dev/pandas: Pandas, Feb. 2020. URL https://doi.org/10.5281/zenodo.3509134.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
- Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(85):2825–2830, 2011. URL http://jmlr.org/papers/v12/pedregosa11a.html.
- Film: Visual reasoning with a general conditioning layer. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- F. Perez and B. E. Granger. Ipython: A system for interactive scientific computing. Computing in Science Engineering, 9(3):21–29, 2007. doi: 10.1109/MCSE.2007.53.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Compositional Languages Emerge in a Neural Iterated Learning Model. feb 2020. URL http://arxiv.org/abs/2002.01365.
- " lazimpa": Lazy and impatient neural agents learn to communicate efficiently. arXiv preprint arXiv:2010.01878, 2020.
- Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
- Universal value function approximators. In International conference on machine learning, pages 1312–1320. PMLR, 2015.
- Iterated learning: A framework for the emergence of language. Artificial Life, 9(4):371–389, 2003. URL https://www.mitpressjournals.org/doi/abs/10.1162/106454603322694825.
- Improving generalization for abstract reasoning tasks using disentangled feature representations. Nov. 2018.
- Word length and word frequency. Springer, 2007.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- scikit-image: image processing in python. PeerJ, 2:e453, 2014.
- G. Van Rossum and F. L. Drake. Python 3 Reference Manual. CreateSpace, Scotts Valley, CA, 2009. ISBN 1441412697.
- Are disentangled representations helpful for abstract visual reasoning? May 2019.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2.
- Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pages 1995–2003. PMLR, 2016.
- Wes McKinney. Data Structures for Statistical Computing in Python. In Stéfan van der Walt and Jarrod Millman, editors, Proceedings of the 9th Python in Science Conference, pages 56 – 61, 2010. doi: 10.25080/Majora-92bf1922-00a.
- R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- Compositional generalization in unsupervised compositional representation learning: A study on disentanglement and emergent language. Oct. 2022.
- G. K. Zipf. Human behavior and the principle of least effort: An introduction to human ecology. Ravenio Books, 2016.