External Comparison Model (ECM)
- ECM is a probabilistic generative model that explains how primates modulate their subjective reward valuation through direct observation of partner rewards.
- It employs a multi-layered mMLDA framework with SERKET message-passing to integrate diverse behavioral and environmental modalities.
- Empirical results demonstrate that ECM outperforms similar models in predicting anticipatory licking, validating its approach to external reward comparison.
The External Comparison Model (ECM) is a probabilistic generative model developed to explain social comparison behaviors in primates, specifically evaluating how an individual’s subjective valuation of reward is modulated by the presence and magnitude of a partner’s rewards. The ECM is formulated within the multi-layered, multimodal Latent Dirichlet Allocation (mMLDA) framework and utilizes message-passing techniques to integrate data from multiple behavioral and environmental modalities. Central to the ECM is the direct incorporation of objective partner reward information, eschewing any explicit inference regarding the partner’s unobservable subjective valuation. This approach contrasts with alternative models that attempt to infer hidden states for the partner and has demonstrated superior empirical performance in explaining anticipatory-licking behavior in macaques under social reward paradigms (Taniuchi et al., 21 Dec 2025).
1. Probabilistic Generative Structure
The ECM operates under an mMLDA framework implemented via the SERKET message-passing architecture. Observations within each experimental block ("document") comprise five modalities: action-intention of self (), action-intention of partner (), reward counts of self (), reward counts of partner (), and a high-dimensional visual stimulus (). All modalities are generated conditionally on a common latent "situation-awareness" topic , corresponding to the six experimental social-comparison conditions. The self-valuation topic () and both action topics (, ) are hierarchically dependent on . Notably, the model omits any latent partner-valuation topic (), instead modeling partner rewards as observed variables.
The joint probability for a document is given by
Typical priors are Dirichlet-distributed vector priors for topic and modality parameters (, ), and conditional multinomials for topic transition tables.
2. Mechanism of External Comparison
A defining aspect of the ECM is the direct influence of observed partner reward frequencies on the self’s latent valuation. The partner’s objective reward counts () impact the top-level situation node (), which in turn conditions the self-valuation topic (). The crucial message update is
ensuring that the pathway from to is unmediated by any latent partner valuation. This embodies the principle of external comparison, operationalized as a direct statistical dependency between external (partner) outcomes and the self’s own value estimation, with no attempt to reconstruct or model the partner’s subjective state.
3. Learning and Inference Architecture
Inference is conducted using collapsed Gibbs sampling for all latent variables, harmonized by SERKET’s modular message passing between mMLDA components. For any latent topic node , the Gibbs update is
where and are the usual LDA-style pseudo-counts. Model convergence is empirically established after 300 total sweeps: each mMLDA module runs 100 sweeps, sequentially cycled three times under SERKET orchestration, stabilizing the posterior distribution of topic assignments.
4. Hyperparameter Specification and Latent Structure
All topic layers are configured for topics (one per experimental condition). Dirichlet hyperparameters are set identically across modalities (). Modality-weight hyperparameters—reflecting pseudo-count scaling for visual, licking, self-reward, and partner-reward modalities—are optimized via Bayesian search to minimize KL-divergence between ECM-predicted and observed licking distributions; optimal values are visual = 300, licking = 200, self-reward = 200, partner-reward = 200. The latent topic variables are , (action-intention), (self-valuation), and (situation-awareness); the absence of is foundational to the model.
5. Empirical Evaluation and Dataset
The ECM is evaluated on data from Noritake et al. (2018), involving two face-to-face macaques performing a social comparison paradigm over 292 days, partitioned into six experimental blocks per day (yielding 1,752 blocks). Experimental manipulations yield three self-variable blocks (self reward at 25/50/75%, partner fixed at 20%) and three partner-variable blocks (partner reward at 25/50/75%, self fixed at 20%). Observed features per block include: normalized/quantized licking behaviors for both monkeys, reward frequencies, and visually encoded cues via a 512-dimensional VQ-VAE codebook (CIFAR-10 pretrained). Training is conducted on 233 days (1,398 blocks), with evaluation on 59 days (354 blocks).
6. Quantitative Performance and Comparative Model Analysis
ECM outperforms both the Internal Prediction Model (IPM) and No Comparison Model (NCM) in classifying the self-valuation topic () into the true experimental conditions, as summarized:
| Model | Rand Index (mean over test days) |
|---|---|
| ECM | 0.88 |
| IPM | 0.79 |
| NCM | 0.75 |
| Chance | 0.72 |
Behavioral predictions from ECM, given visual cues alone, correctly replicate (i) increased anticipatory licking for increasing self reward in self-variable blocks, and (ii) decreased licking as partner-probability increases in partner-variable blocks—patterns that alternative models (IPM, NCM) fail to capture. On “imaginary” (unseen) reward splits, ECM preserves the monotonic decrease in self-value relative to rising partner probability. Normalized Mutual Information (NMI) calculations reveal the path in ECM transmits significantly more information than in IPM (ECM median NMI ≈ 7.28×10⁻³, IPM ≈ 3.66×10⁻³; ), supporting the efficiency of direct external comparison over latent-state inference.
7. Assumptions, Limitations, and Prospective Extensions
The ECM is predicated on the assumption that monkeys use observable, objective differences in reward to guide social comparison, rather than inferring unobservable subjective partner valuations. This is facilitated in the experimental design by equating reward modality and value across both actors (identical water rewards), minimizing the likelihood of forced subjective divergence. Data preprocessing aggregates trial-wise behaviors into block-level summaries; exclusive reward delivery per trial is not explicitly modeled at the single-trial level.
Future directions include neuroscientific validation by mapping ECM’s latent variables to neural recordings, human studies with direct self/other-valuation reporting (e.g., ultimatum, fairness paradigms), and tasks where the material incentives differ qualitatively between agents, to empirically test the boundaries of external versus internal social comparison strategies. A plausible implication is that, in contexts lacking reward homomorphism, subjective partner-value inference may emerge as necessary, in contrast to the direct-objective pathway sufficing under the present paradigm (Taniuchi et al., 21 Dec 2025).