Reference Games: Theory & Applications

Updated 11 April 2026

Reference games are interactive tasks defined by a structured mapping of referents and signals that enable studies of communication, pragmatics, and game theory.
They employ formal mechanisms, including speaker and listener functions, with metrics like resolution accuracy and clarification rates measured in visual, spatial, and language tasks.
Reference games enhance research in language learning and cognitive science by modeling convention formation, pragmatic reasoning, and equilibrium strategies in interactive settings.

Reference games are a class of interactive tasks and formal models in which a speaker and a listener coordinate to resolve reference, typically within a set of possible referents, using language or other signaling mechanisms. Reference games serve as foundational testbeds for research in linguistics, cognitive science, vision–language modeling, pragmatics, and game theory. Their mathematically tractable structure enables careful measurement of communicative competence, pragmatic adaptation, model uncertainty, convention formation, and the alignment between language use and semantic grounding.

1. Formal Structure and Variants

Reference games instantiate a collaborative mapping between referential acts and intended targets. At their core, a reference game is defined by a tuple, with explicit roles:

Referents (Objects): A finite set 𝒪 = {o₁,...,oₙ} of candidate objects, stimuli, or options
Messages/Utterances: Set 𝓜, typically free-form language, labels, or attribute phrases
Speaker function S: S: 𝒪 → 𝓜, producing u = S(t) for target t∈𝒪
Listener function L: L: 𝓜 × 𝒪ⁿ → 𝒪, selecting or inferring the referent given a message

The game unfolds in discrete rounds: a target is selected, the speaker produces a message, and the listener must choose the referent or take an action (e.g., request clarification). This framework generalizes the “Lewis signaling game” and forms the basis for formal pragmatic analyses, learning experiments, and computational models (Ali et al., 12 Jan 2026).

Reference games manifest in multiple modalities:

Visual reference games: Objects are images or 3D shapes; messages are attribute phrases or natural-language descriptions (Su et al., 2017, Koo et al., 2021).
Location games: Players select points in a spatial domain to attract resources or agents, incorporating “reference locations” and deviation costs to model policy formation or differentiation (Gaëtan et al., 2022).
Language-only games: Words or phrases must unambiguously signal referents within a set of options, sometimes under strong resource or knowledge constraints (Shen et al., 2018).

2. Theoretical Foundations and Semantic Models

Reference games operationalize and extend foundational ideas from semantics and game theory. In classical semantics, meaning is truth-conditional; reference games provide measurable operational definitions of “reference resolution.” Game-theoretic semantics recast communication as interactive strategy selection, encompassing both literal encoding and pragmatic reasoning.

In game-theoretic terms, solutions to reference games correspond to equilibria: fixed points where rational agents’ strategies maximize their chances of successful reference. For example:

Nash equilibria in spatial reference games balance market capture and deviation cost, capturing agglomeration vs. differentiation (see Section 3) (Gaëtan et al., 2022).
Pragmatic reasoning is captured by recursive models—e.g., the Rational Speech Acts (RSA) framework—where speakers simulate listeners and adjust messages accordingly (Shen et al., 2018). However, empirical findings often show that in resource-limited or strongly associative tasks, literal strategies dominate over higher-order pragmatic recursion.

The nominal game semantics tradition extends reference game principles to the semantics of programming languages with general references and state, using the theory of nominal sets and fully abstract models for higher-order interaction (0907.4477).

3. Empirical Methodologies and Experimental Paradigms

A broad suite of reference game paradigms supports empirical investigation of communicative adaptation, convention formation, and model performance:

Pairwise and trio visual games: As in “Reasoning about Fine-grained Attribute Phrases using Reference Games,” speakers describe how a target image differs from distractors; listeners guess the target, enabling measurement of fine-grained attribute learning (Su et al., 2017).
Repeated games: Systematic study of convention emergence and efficiency gains over repeated interaction with fixed partners, as in dyadic tangram tasks. Analyses quantify structural adaptation (e.g., length reduction, syntactic clustering, discriminative word persistence) and the role of feedback (Hawkins et al., 2019).
Clarification-protocol games: Listeners can explicitly request clarification if uncertain, providing a testbed for model uncertainty calibration, clarification triggering, and dialogue robustness (Ali et al., 12 Jan 2026).
Associative reference games: Designed with minimal compositionality, these isolate the use of collocational, distributional, or graph-based association as the sole resource for reference, as in Codenames-style tasks (Shen et al., 2018).

Underlying these paradigms are specific metrics:

Reference resolution accuracy, relaxed accuracy, clarification request rate (Ali et al., 12 Jan 2026)
Attribute phrase recall, listener/speaker cross-modal retrieval, part-level attention maps (Su et al., 2017, Koo et al., 2021)
Mixed-effects regression on linguistic adaptation, discriminative word survival, entropy/perplexity over repeated games (Hawkins et al., 2019)

4. Computational, Cognitive, and Pragmatic Insights

Reference games furnish deep insight into human–model alignment, representation learning, and the nature of linguistic and visual meaning:

Fine-grained discrimination and compositionality: Data from contrastive attribute-phrase reference games enable construction of attribute-embeddings that outperform hand-engineered attribute sets on downstream classification tasks, yield human-interpretable concept spaces, and support zero-shot generalization of part segmentation (Su et al., 2017, Koo et al., 2021).
Meta-cognitive capacities in agents: Clarification-enabled protocols reveal a fundamental limitation of even state-of-the-art vision-LLMs: failure to translate diffuse internal belief into actionable clarification requests, contrasting with human robust calibration and contextual help-seeking (Ali et al., 12 Jan 2026).
Role of feedback and adaptation: Corpus analyses of repeated reference games demonstrate path dependence, within-pair semantic convergence, cross-pair divergence, and chunk-based syntactic drop-out. Discriminative words persist; ad hoc conventions emerge stably but remain highly idiosyncratic across pairs (Hawkins et al., 2019).
Modeling other agents’ conceptual understanding: Reasoning about population-level variation in perceptual and conceptual machinery is critical for robust speaker performance in heterogeneous populations of listeners (Corona et al., 2019).
Empirical boundaries of pragmatic reasoning: In associative, one-shot reference games, people rely predominantly on direct collocational fit; literal “bigram” metrics explain human behavior better than recursive pragmatic models, even in configurations optimized for distinguishing model predictions (Shen et al., 2018).

5. Reference Games in Game Theory and Formal Semantics

Beyond language and vision applications, reference game concepts centralize the role of reference points, cost structures, and equilibrium selection in game-theoretic environments:

Location games with reference points: Each player i selects a position $x_i\in[0,1]$ , pays a strictly convex deviation cost γ(|x_i–r_i|) from an exogenous reference $r_i$ , and captures local market share. Equilibria exist only for specific parameter regimes, with explicit thresholds depending on the quadratic cost parameter. Interior agents ideally match their reference, while boundaries may deviate to capitalize on market share (Gaëtan et al., 2022).
Nominal games: Provide a fully abstract compositional semantics for languages with higher-order references. Moves are annotated with fresh names; strategies embody program behaviors with precise handling of name allocation, store, and context equivalence (0907.4477).

Reference Game Paradigm	Communicative Role	Key Metrics/Findings
Visual attribute game (Su et al., 2017)	Attribute-level distinction	20% ↑ in classification accuracy, compositionality
Clarification-enabled (Ali et al., 12 Jan 2026)	Uncertainty, clarification	Humans: 92% accuracy, models: sparse clarification
Repeated tangram (Hawkins et al., 2019)	Convention and adaptation	30% distinctive word persistence, path dependence
Spatial location (Gaëtan et al., 2022)	Reference equilibrium	Unique eq. thresholds, social inefficiency, anchoring
Associative Codenames (Shen et al., 2018)	Collocational fit	Literal bigram explains human choice; RSA underfits

6. Applications and Ongoing Research Directions

Reference games now underpin research and benchmarking in multiple domains:

Grounded language learning: Supervision from reference tasks aligns multimodal embeddings and supports few-shot and zero-shot transfer of linguistic concepts to novel visual domains (Su et al., 2017, Koo et al., 2021).
Interactive evaluation of LLMs: Use as minimal testbeds for probing calibration, explanation, and dialogue repair strategies in LLMs and VLMs (Ali et al., 12 Jan 2026).
Computational pragmatics and cognitive science: Quantitative benchmarks and open corpora from reference games enhance models of convention formation, feedback-driven language change, and adaptive communication (Hawkins et al., 2019).
Game-theoretic policy and mechanism design: Location reference games generalize classic Hotelling-Downs competition, with implications for spatial voting, retail, and platform design (Gaëtan et al., 2022).
Denotational semantics: Nominal games supply compositional tools for analyzing references in programming languages, especially in the presence of state and name generation (0907.4477).

Ongoing work aims to bridge levels of reasoning (pragmatic, associative, compositional), incorporate richer feedback signals, handle partner- and population-level diversity, and design interactive tasks that better probe the pragmatic and meta-cognitive competencies of language-enabled agents.

7. Significance and Theoretical Implications

Reference games offer a rigorously analyzable interface between communicative theory and observable behavior, supporting both foundational and practical insights:

They clarify boundaries where literal meaning, associative structure, and pragmatic inference predominate (Shen et al., 2018).
Conventions in reference games emerge not by global optimization, but by local adaptation, feedback, and random symmetry-breaking within communicative pairs (Hawkins et al., 2019).
Componential attribute models learned in visual reference games furnish high-precision, human-interpretable representations broadly useful in AI, linguistics, and cognitive science (Su et al., 2017, Koo et al., 2021).
Game-theoretic reference games with explicit reference points reveal the dependence of equilibrium uniqueness, social welfare, and adaptation on structural cost parameters and agent configuration (Gaëtan et al., 2022).
The challenges observed in current models on reference games expose critical gaps in self-calibration, clarification, and adaptivity in interactive, situated language use (Ali et al., 12 Jan 2026).

Overall, reference games provide a unified, controllable, and empirically rich setting for analyzing the mechanisms underpinning reference, adaptation, and the emergence of shared meaning across natural and artificial systems.