Referential Games Overview

Updated 18 June 2026

Referential games are structured interactive protocols where one agent sends a message and another identifies a target among distractors.
They are used to investigate language emergence, compositionality, pragmatic inference, and theory of mind in both computational and cognitive research.
Variants such as attribute-based, multi-modal, and iterated games enable detailed analysis of agent coordination and emergent communication patterns.

A referential game is a formal communicative interaction in which one agent (often called the "speaker" or "sender") observes a target referent and emits an utterance, while another agent (the "listener" or "receiver") observes the utterance and attempts to select the correct referent from a candidate set. Referential games provide a multipurpose, highly structured setting for investigating topics such as language emergence, compositionality, pragmatic inference, theory of mind, grounding, and alignment between agents with different perceptual or conceptual spaces. In recent years, extensive research in machine learning, linguistics, and cognitive science has utilized referential games as testbeds for emergent communication, systematic generalization, population dynamics, and human–machine interaction.

1. Formal Structure and Core Variants

The canonical referential game is defined as a tuple of agents, referent set, message space, and decision rules. Let $\mathcal{O}$ denote the set of possible objects (which can be symbolic attribute vectors, images, or real-world entities), and let $\mathcal{M}$ be the discrete message space. Each round proceeds as follows:

A context $C = \{o_1, \dots, o_K\} \subseteq \mathcal{O}$ is sampled (e.g., one target, $K-1$ distractors).
A target $t \in C$ is designated.
The speaker observes $t$ (or, in some variants, the full context) and generates $m \sim \pi_S(\cdot \mid t)$ .
The listener observes $(m, C)$ and selects $\hat{t} \sim \pi_L(\cdot \mid m, C)$ .
Payoff is $1$ if $\mathcal{M}$ 0, $\mathcal{M}$ 1 otherwise, or more generally a reward $\mathcal{M}$ 2.

Variants include:

Attribute-based: The message space consists of attributes aligned with object factors (Corona et al., 2019).
Symbolic or visual: Inputs are symbolic vectors (Guo et al., 2020, Lazaridou et al., 2018) or raw pixels (Denamganaï et al., 2023).
Multi-modal: Sender and listener may have access to different information modalities (e.g., image vs. text) (Evtimova et al., 2017).
Population games: Pools of agents with varying perceptual, conceptual, or architectural profiles (Dagan et al., 2020, Corona et al., 2019).
Iterated/multi-step: Communication unfolds over multiple back-and-forth utterances (Evtimova et al., 2017, Tan et al., 5 Nov 2025, Willemsen et al., 2023).
Hierarchical and set/concept games: Targets are abstracted over multiple levels, or generalizations over sets are communicated (Ohmer et al., 2022, Mu et al., 2021).

This formalization permits the specification of agent architectures, learning paradigms (policy-gradient, Gumbel-Softmax relaxation, meta-learning), and evaluation metrics.

2. Model Architectures and Learning Paradigms

Neural referential game agents are typically parameterized by encoders, decoders, and policy modules. Common features include:

Speaker (Sender): Maps target input (image or vector) to a latent representation, then decodes to a message (e.g., via MLP, LSTM, or transformer) (Guo et al., 2020, Evtimova et al., 2017, Corona et al., 2019).
Listener (Receiver): Encodes candidate objects and translates message to a hidden representation; decision made via similarity scoring (dot product) or MLP (Guo et al., 2020, Denamganaï et al., 2023).
Multi-agent and population settings: Listeners are parameterized by conceptual understanding profiles or thresholds, allowing for modeling of inter-agent variation and adaptation (Corona et al., 2019, Dagan et al., 2020).
Structured Memory and Pragmatics: Recurrent modules and explicit reference-resolution models support multi-turn and collaborative reasoning (Fried et al., 2021).

Training objectives differ based on communication constraints:

REINFORCE: Policy gradient for discrete sampling (Guo et al., 2020, Lazaridou et al., 2018).
Straight-Through Gumbel-Softmax (STGS): Enables end-to-end backpropagation through discrete message channels (Guo et al., 2020, Denamganaï et al., 2020, Denamganaï et al., 2023).
Obverter: Speaker internally reasons via its own model of the listener to maximize mutual intelligibility (Denamganaï et al., 2023).
Meta-learning: Agents adapt quickly to new semantic spaces via meta-referential games (Denamganaï et al., 2022).

The interaction between architecture, inductive bias (e.g., fixed message length, vocabulary size), and training regime strongly influences the emergent communication protocol (Guo et al., 2020, Denamganaï et al., 2020).

3. Compositionality, Expressivity, and Systematicity

A central objective in referential game research is analyzing the structure of emergent languages. Core metrics include:

Topographic Similarity (TopSim): Measures alignment between distances in object (semantic) space and message space; Spearman correlation is typically reported (Guo et al., 2020, Denamganaï et al., 2020, Denamganaï et al., 2023).
Positional Disentanglement (PosDis): Quantifies the degree to which individual message positions encode specific object attributes; original and extended forms are used (Denamganaï et al., 2023, Ohmer et al., 2022).
Normalized Mutual Information (NMI): Captures one-to-one mapping between concepts and messages in hierarchy games (Ohmer et al., 2022).
Language expressivity: Measured via transfer of a protocol to different tasks, e.g., reconstruction (Guo et al., 2020).
Disentanglement Metrics: FactorVAE, MIG, and modularity of neural representations (Denamganaï et al., 2023).

Empirical observations:

Classical referential games yield highly compositional emergent protocols if input structure is disentangled (Guo et al., 2020, Lazaridou et al., 2018, Denamganaï et al., 2022).
Increased channel capacity (longer messages) improves compositionality, but merely increasing vocabulary size can encourage idiosyncratic codes (Denamganaï et al., 2020).
Zero-shot systematicity (generalization to novel combinations) is not guaranteed even for protocols with high compositionality scores, especially in visual or entangled domains (Denamganaï et al., 2023, Denamganaï et al., 2020).

4. Grounded, Hierarchical, and Generalization Extensions

Extensions of referential games address abstraction, hierarchical reference, and concept generalization:

Hierarchical Reference Games: Agents communicate about abstractions over sets of primitive attributes, producing protocols that support both implicit (symbol omission) and explicit (dedicated tokens) abstraction (Ohmer et al., 2022).
SetRef and Concept Games: Teachers must succinctly communicate categories or Boolean concepts over object sets, leading to increased systematicity and interpretable emergent codes (Mu et al., 2021).
Approximate Compositional Reconstruction: Extraction of logical operation analogs (AND, OR, NOT) from emergent codes via data-driven sequence-to-sequence models (Mu et al., 2021).
Meta-Referential Games and CLBs: Agents are evaluated on their ability to rapidly solve the binding problem and generalize compositionally in new semantic spaces, requiring both receptive and constructive compositional learning behaviors (Denamganaï et al., 2022).

This work highlights abstraction as a pragmatic driver of compositionality, grounding as essential for agent coordination, and meta-learning as a route to robust symbolic communication.

5. Population Dynamics, Adaptation, and Learnability

Recent studies emphasize the importance of variation and adaptation:

Population of Listeners: Agents must model and adapt to listeners with different perceptual/conceptual semantics, using rapid probing and stateful memory (e.g., LSTM embeddings) (Corona et al., 2019).
Cultural Transmission and Co-evolution: Language learnability is shaped by both cultural selection (replacement of low-fitness agents) and architectural evolution (mutation of neural structures); optimal languages and protocols are those that are both expressive and easily acquired by new agents (Dagan et al., 2020).
Interaction with Novel Partners and Perceptual Mismatches: Protocols built for heterogeneous populations must be robust to divergent perceptual encoders (e.g., ResNet-Speaker, PNASNet-Listener) (Corona et al., 2019).

Learnability experiments verify that co-evolutionary regimes yield highly structured, low-entropy languages that are maximally compact and generalizable.

6. Extensions to Dialogue, Pragmatics, and Human Interaction

Referential games support modeling pragmatic and dialogic interaction:

Collaborative Agreement and Mixed-Initiative Dialogue: Multi-round, role-symmetric games such as "A Game Of Sorts" prioritize collaborative ranking and negotiation, with empirical measurements of dialog balance, initiative, and grounding (Willemsen et al., 2023).
Grounded Collaborative Dialogue: Structured reference resolution, referent memory, and pragmatic generation modules enable agents to track, disambiguate, and refer to spatial entities in dynamic contexts, achieving substantial improvements in task completion (Fried et al., 2021).
Uncertainty and Clarification Requests: Recent VLM experiments leverage referential games to measure models' ability to signal and act upon internal uncertainty through clarification, revealing systematic differences both between models and with respect to human performance (Ali et al., 12 Jan 2026).
Iterated Reference and Context-sensitive Pragmatics: Few-shot, multi-round reference games show that modern VLMs benefit from in-context exemplars but still fail to match human calibration or ad hoc adaptation to abstract referents (Tan et al., 5 Nov 2025).

These results underscore the value of referential games as testbeds for pragmatic reasoning, interactive competence, and human-aligned emergent communication in artificial systems.

7. Practical Frameworks and Implementation Toolchains

Standardized toolkits and benchmarks drive reproducibility:

ReferentialGym: Taxonomizes and implements a wide variety of referential game paradigms (full/partial observability, variable length, multi-modal, discriminative/generative), provides agent class APIs, integrates major training algorithms (REINFORCE, Gumbel-Softmax, Obverter), and ships with evaluation modules for compositionality, ambiguity, and coordination (Denamganaï et al., 2020).
Baseline Experiments: Prototypical experiments demonstrate the effect of architecture and optimization (fixed vocab vs. variable length, pretrained feature extractors vs. end-to-end training, inclusion of auxiliary objectives) on emergent protocols and their semantic properties (Denamganaï et al., 2020, Mihai et al., 2019).
Downstream Applications: Protocols and encodings learned from referential games have been used for vision transformer acceleration, pretraining, and interpretable mid-level patch selection (Gupta et al., 2021).

These frameworks facilitate rapid exploration of design choices, evaluation of emergent language properties, and cross-domain comparison of communication protocols.

In sum, referential games represent a foundational methodology for investigating communication, coordination, compositionality, and pragmatic adaptation from both a computational and cognitive perspective. The framework's flexibility—in message space, learning paradigm, population composition, and evaluation—has made it central to state-of-the-art research on emergent language and meaningful alignment in both artificial agents and human-machine systems (Corona et al., 2019, Guo et al., 2020, Evtimova et al., 2017, Mu et al., 2021, Ohmer et al., 2022, Willemsen et al., 2023, Fried et al., 2021, Denamganaï et al., 2020, Dagan et al., 2020, Denamganaï et al., 2023, Gupta et al., 2021, Tan et al., 5 Nov 2025, Ali et al., 12 Jan 2026, Denamganaï et al., 2022).