Papers
Topics
Authors
Recent
Search
2000 character limit reached

Joint Attention Naming Game

Updated 15 April 2026
  • Joint Attention Naming Game is a decentralized Bayesian framework for symbol negotiation using joint attention and probabilistic acceptance mechanisms.
  • It employs a generative Bayesian graphical model and local sampling techniques to align shared representations between interacting agents.
  • Empirical outcomes reveal superior categorization accuracy and robust human–AI agreement compared to baseline models.

The Joint Attention Naming Game (JA-NG) is a formal framework and experimental paradigm for modeling the emergence of shared symbols (signs) between two agents—biological or artificial—under the condition of joint attention. JA-NG integrates a Metropolis–Hastings-based probabilistic acceptance mechanism with a generative Bayesian graphical model, and has become a standard testbed for both computational and human-in-the-loop studies of decentralized symbol negotiation. It is the canonical empirical instantiation of the Metropolis-Hastings Naming Game (MHNG), providing concrete evidence that symbol emergence can be interpreted as decentralized Bayesian inference over shared representations (Okumura et al., 2023, Taniguchi et al., 2022, Okumura et al., 18 Jun 2025).

1. Formal Structure and Bayesian Framework

In JA-NG, two agents (A and B; or Human and AI) sequentially take the roles of speaker and listener. Each round, both agents are presented with the same referent (object xnx_n), embodying joint attention, and independently assign it to a perceptual category cnc^*_n. The speaker generates a proposed sign snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n) from its internal parameters, which is then evaluated by the listener, who decides to accept or reject based on a probabilistic criterion.

The interaction is formalized as an approximate decentralized inference process in an interpersonal probabilistic graphical model (Inter-PGM). The latent variables include the shared sign sns_n, agent-specific perceptual category cnc^*_n and observation xnx^*_n, sign–category parameters Θ\Theta^*, and category–observation parameters Φ\Phi^*. The generative process is:

  • snP(snγ)s_n \sim P(s_n | \gamma)
  • ΘP(Θα),    ΦP(Φβ)\Theta^* \sim P(\Theta^* | \alpha),\;\; \Phi^* \sim P(\Phi^* | \beta)
  • cnc^*_n0
  • cnc^*_n1

Symbol negotiation thus targets the joint posterior cnc^*_n2; under MHNG conditions, the dyad implements an approximate Metropolis–Hastings sampler for this distribution (Okumura et al., 2023, Okumura et al., 18 Jun 2025).

2. Metropolis–Hastings Acceptance Logic

At the core of JA-NG is the Metropolis–Hastings acceptance probability governing the listener's response. Given a speaker's proposal cnc^*_n3, the listener computes:

cnc^*_n4

This ratio measures the relative compatibility of the listener’s current category structure with the proposed versus the existing sign; it is functionally equivalent to an MH update for the sign posterior in the joint Bayesian model (Okumura et al., 2023, Taniguchi et al., 2022, Okumura et al., 18 Jun 2025). Acceptance of proposals with this probability ensures that, over multiple roles and rounds, the empirical sign distribution approximates the shared posterior targeted in Bayesian data fusion.

The generative model and acceptance update can be extended using deep generative models (for example, inter-GMM+VAE), enabling symbol emergence with high-dimensional or multimodal perceptual input (Taniguchi et al., 2022).

3. Experimental Protocols and Model Instantiations

JA-NG has been implemented in human–human, human–AI, and pure AI–AI settings to empirically test decentralized Bayesian symbol emergence. Protocols are characterized by:

  • Alternating speaker/listener roles across cnc^*_n5 objects and multiple rounds
  • Joint presentation of perceptually ambiguous referents (e.g., color patches drawn from overlapping Gaussians, digits (MNIST), fruits (Fruits 360))
  • Internal representations as agent-specific Gaussian mixture models or VAEs

A canonical experimental protocol involves initializing category and sign assignments, conducting repeated communication rounds (each with speaker proposal, listener accept/reject, and parameter updates), and recording all decisions. In human–AI studies, AI agents may use MH acceptance, always-accept (supervised learning mimic), or always-reject (unsupervised learning mimic) behavioral policies (Okumura et al., 18 Jun 2025).

4. Empirical Outcomes and Model Comparisons

Empirical studies consistently demonstrate that human and AI listeners track the Metropolis–Hastings criterion: observed acceptance rates increase monotonically with cnc^*_n6, closely matching predicted psychometric curves (cnc^*_n7 with cnc^*_n8 fitted). In (Okumura et al., 2023), for human–human dyads, cnc^*_n9, snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)0; for human–AI pairs (Okumura et al., 18 Jun 2025), snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)1, snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)2. MH-based models provide statistically superior prediction of trial-level decisions compared to alternatives (constant, numerator-only, subtraction, binary models).

Quantitatively, MHNG-based JA-NG yields:

  • Higher categorization accuracy (Adjusted Rand Index, ARI) and greater convergence in sign usage relative to always-accept or always-reject baselines
  • Strong agent–agent or human–AI agreement with posterior sign histograms (agreement >0.76 for MH, significantly exceeding baselines)
  • MNIST and Fruits 360 results show ARI snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)3 with snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)4 for shared signs (Taniguchi et al., 2022)

A summary of core empirical findings is provided below.

Study Partner Types Outcome: Accuracy (ARI) MH vs. Baseline
Human–Human MH snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)5, snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)6 MH > Constant, Num., Bin.
Human–AI MH, AA, AR MH: ARI snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)7 MH > AA > AR
AI–AI MHNG + VAE ARI snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)8, snP(snΘSp,cnSp)s^*_n \sim P(s_n | \Theta^{Sp}, c^{Sp}_n)9 MHNG > all-accept, unsup.

5. Decentralized Bayesian Inference and Predictive Coding

JA-NG constitutes an empirical realization of “collective predictive coding”: multiple agents dynamically adjust their internal concept–sign and category–percept mappings to minimize prediction error (free energy) not only with respect to their own observations, but also in reaction to proposals from their partner. Symbol emergence thereby becomes a distributed inference process—joint statistical alignment on a shared latent variable (the sign system)—rather than unilateral learning or explicit referential agreement (Taniguchi et al., 2022, Okumura et al., 18 Jun 2025).

The process is fundamentally decentralized; no agent has direct access to the partner’s observations or priors. Agreement on signs and categories is achieved via local sampling (Gibbs for continuous latents, MH for signs) and repeated reciprocation, operationalizing distributed Bayesian data fusion.

6. Broader Implications and Integration with Co-Creative Learning

JA-NG under MHNG provides the first quantitative evidence that human–AI teams can engage in fully co-creative symbol emergence: both agents, each with partial and non-overlapping views of objects, fuse their beliefs through local interactions to form shared external representations. This mechanism supports symbiotic AI alignment, in which an artificial agent learns with—not only from—a human partner by balancing its own internal model with external proposals.

A plausible implication is that such interaction-driven alignment can produce communication protocols and shared symbolic repertoires adaptable to heterogeneous, multimodal environments, with direct application to mixed human–robot and hybrid agent systems. These results indicate a new paradigm for symbiotic human–AI collaboration, diverging from traditional supervised or unsupervised frameworks by implementing mutual decentralization in representation learning and symbol negotiation (Okumura et al., 18 Jun 2025).

7. Limitations and Future Directions

Current JA-NG experiments primarily use low-dimensional, synthetic stimuli (e.g., color patches, digits, restricted sign/index sets). Extending JA-NG to rich perceptual domains (such as natural language or complex vision), larger vocabularies, ecological referents, and continuous semantic spaces remains to be addressed.

The choice of generative model (Gaussian mixture, VAE hybridization), parameter initialization, and absence of constraints such as real-time interaction, metacognitive feedback, or ecological validity may affect the generality of observed dynamics. Future research may address mixed human–robot/AI games, scaling to many agents, integration with real-world perceptual tasks, and in-the-wild evaluation (Okumura et al., 2023, Taniguchi et al., 2022, Okumura et al., 18 Jun 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Joint Attention Naming Game (JA-NG).