SimSiam Naming Game: A Unified Approach for Representation Learning and Emergent Communication
Abstract: Emergent communication, driven by generative models, enables agents to develop a shared language for describing their individual views of the same objects through interactions. Meanwhile, self-supervised learning (SSL), particularly SimSiam, uses discriminative representation learning to make representations of augmented views of the same data point closer in the representation space. Building on the prior work of VI-SimSiam, which incorporates a generative and Bayesian perspective into the SimSiam framework via variational inference (VI) interpretation, we propose SimSiam+VAE, a unified approach for both representation learning and emergent communication. SimSiam+VAE integrates a variational autoencoder (VAE) into the predictor of the SimSiam network to enhance representation learning and capture uncertainty. Experimental results show that SimSiam+VAE outperforms both SimSiam and VI-SimSiam. We further extend this model into a communication framework called the SimSiam Naming Game (SSNG), which applies the generative and Bayesian approach based on VI to develop internal representations and emergent language, while utilizing the discriminative process of SimSiam to facilitate mutual understanding between agents. In experiments with established models, despite the dynamic alternation of agent roles during interactions, SSNG demonstrates comparable performance to the referential game and slightly outperforms the Metropolis-Hastings naming game.
- Contrastive variational autoencoder enhances salient features. CoRR, abs/1902.04601, 2019. URL http://arxiv.org/abs/1902.04601.
- A contrastive learning approach for training variational autoencoder priors. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=LcSfRundgwI.
- Marco Baroni. Linguistic generalization and compositionality in modern artificial neural networks. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1791):20190307, 2020. doi: 10.1098/rstb.2019.0307.
- Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013a.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013b.
- Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. ISBN 978-0-387-31073-2.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- A review of the applications of deep learning-based emergent communication. Transactions on Machine Learning Research, 2024. ISSN 2835-8856.
- Nicolo’ Brandizzi. Toward more human-like ai communication: A review of emergent communication research. IEEE Access, 11:142317–142340, 2023. doi: 10.1109/ACCESS.2023.3339656.
- Understanding linguistic evolution by visualizing the emergence of topographic mappings. Artificial life, 12(2):229–242, 2006. doi: 10.1162/106454606776073323.
- Computer simulation: A new scientific approach to the study of language evolution. Simulating the Evolution of Language, pp. 3–28, 2002.
- Emerging properties in self-supervised vision transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9630–9640, 2021. doi: 10.1109/ICCV48922.2021.00951.
- Compositionality and generalization in emergent languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020.
- Daniel Chandler. Semiotics: The Basics. Routledge, 2002. doi: 10.4324/9780203014936.
- A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15750–15758, 2021.
- When does contrastive visual representation learning work? In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 01–10, 2022. doi: 10.1109/CVPR52688.2022.01434.
- Visual referential games further the emergence of disentangled representations, 2023. URL https://arxiv.org/abs/2304.14511.
- Multi-agent reinforcement learning with emergent communication using discrete and indifferentiable message. In 2023 15th International Congress on Advanced Applied Informatics Winter (IIAI-AAI-Winter), pp. 366–371, 2023. doi: 10.1109/IIAI-AAI-Winter61682.2023.00073.
- Learning to communicate with deep multi-agent reinforcement learning, 2016. URL https://arxiv.org/abs/1605.06676.
- World model learning and inference. Neural networks: the official journal of the International Neural Network Society, 144:573–590, 2021. doi: 10.1016/j.neunet.2021.09.011.
- Emergent communication for understanding human language evolution: What’s missing? arXiv, 2022.
- Deep learning, volume 1. MIT Press, 2016.
- Bootstrap your own latent: A new approach to self-supervised learning. In Advances in Neural Information Processing Systems, volume 33, pp. 21271–21284, 2020.
- Multiagent multimodal categorization for symbol emergence: Emergent communication via interpersonal cross-modal inference. Advanced Robotics, 36(5-6):239–260, 2022.
- Symbol emergence as an interpersonal multimodal categorization. Frontiers in Robotics and AI, 6(134), 2019. doi: 10.3389/frobt.2019.00134.
- Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. In Advances in Neural Information Processing Systems 30, pp. 2146–2156, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Computer Vision and Pattern Recognition (CVPR), 2020.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In ICLR, 2017.
- Compositionality and generalization in emergent communication using metropolis-hastings naming game. In IEEE International Conference on Development and Learning (ICDL 2024), 2024a.
- Emergent communication of multimodal deep generative models based on metropolis-hastings naming game. Frontiers in Robotics and AI, 10, 2024b.
- Jakob Hohwy. The Predictive Mind. Oxford University Press, 11 2013. ISBN 9780199682737. doi: 10.1093/acprof:oso/9780199682737.001.0001.
- Recursive metropolis-hastings naming game: Symbol emergence in a multi-agent system based on probabilistic generative models. Frontiers in Artificial Intelligence, 2023. ISSN 2624-8212. doi: 10.3389/frai.2023.1229127.
- Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2017.
- Self-supervised visual feature learning with deep neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 2020.
- Auto-encoding variational bayes. arXiv, https://arxiv.org/abs/1312.6114, 2013.
- Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Emergent multi-agent communication in the deep learning era, 2020. URL https://arxiv.org/abs/2006.02419.
- Multi-agent cooperation and the emergence of (natural) language. In The International Conference on Learning Representations (ICLR), 2017.
- Contrastive representation learning: A framework and review. IEEE Access, 8:193907–193934, 2020. ISSN 2169-3536. doi: 10.1109/access.2020.3031549. URL http://dx.doi.org/10.1109/ACCESS.2020.3031549.
- Deep learning. Nature, 521(7553):436–444, 2015.
- David Lewis. Convention: A Philosophical Study. John Wiley & Sons, 2008.
- Speaking your language: Spatial relationships in interpretable emergent communication, 2024.
- Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering, 2021.
- Cr-vae: Contrastive regularization on variational autoencoders for preventing posterior collapse. In 2023 7th Asian Conference on Artificial Intelligence Technology (ACAIT), pp. 427–437, 2023.
- dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
- Representation Uncertainty in Self-Supervised Learning as Variational Inference. ICCV, 2023. doi: 10.48550/arxiv.2203.11437.
- Metropolis-hastings algorithm in joint-attention naming game: Experimental semiotics study. Frontiers in Artificial Intelligence, 6, 2023. ISSN 2624-8212. doi: 10.3389/frai.2023.1235231.
- A survey on emergent language, 2024. URL https://arxiv.org/abs/2409.02645.
- Language evolution with deep learning, 2024. URL https://arxiv.org/abs/2403.11958.
- The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL, 1949.
- Luc Steels. The talking heads experiment: Origins of words and meanings. Language Science Press, 2015.
- Survey on frontiers of language and robotics. Advanced Robotics, 33(15-16):700–730, 1 2019.
- Tadahiro Taniguchi. Collective predictive coding hypothesis: Symbol emergence as decentralized bayesian inference. Frontiers in Robotics and AI, 11, 2024. doi: 10.3389/frobt.2024.1353870.
- Symbol emergence in robotics: a survey. Advanced Robotics, 30(11-12):706–728, 2016.
- World models and predictive coding for cognitive and developmental robotics: frontiers and challenges. Advanced Robotics, 37(13), 2023a.
- Emergent communication through metropolis-hastings naming game with deep generative models. Advanced Robotics, 37(19):1266–1282, 2023b. doi: 10.1080/01691864.2023.2260856.
- Understanding self-supervised learning dynamics without contrastive pairs. In IMCL, 2021. URL https://arxiv.org/abs/2102.06810.
- A survey on self-supervised representation learning, 2023. URL https://arxiv.org/abs/2308.11455.
- Progress in the simulation of emergent communication and language. Adaptive Behavior, 11(1):37–69, 2003. doi: 10.1177/10597123030111003.
- Contrastvae: Contrastive variational autoencoder for sequential recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM ’22, pp. 2056–2066, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450392365. doi: 10.1145/3511808.3557268. URL https://doi.org/10.1145/3511808.3557268.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017. URL https://arxiv.org/abs/1708.07747.
- Compositional generalization in unsupervised compositional representation learning: A study on disentanglement and emergent language. International Conference on Neural Information Processing Systems, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.