Model Agreement via Anchoring

Updated 4 March 2026

The framework provides a formal definition of model disagreement and introduces anchoring to reduce output divergence between ML models.
It details algorithmic instantiations across regression, neural networks, and diffusion models, revealing quantifiable bounds and improved reliability.
The approach extends to high-dimensional latent spaces and LLMs by using anchors to expose biases, enhance alignment, and calibrate model confidence.

Model Agreement via Anchoring is an analytical and practical framework for reducing or exploiting model disagreement—defined as the expected divergence in outputs between independently trained machine learning models—through the introduction of explicit anchor signals. Anchored techniques have been systematically developed to bound disagreement, improve model reliability, enhance alignment, and expose underlying biases across classical predictors, neural networks, diffusion architectures, and LLMs.

1. Formalization of Model Disagreement and Anchoring

In regression settings, model disagreement is quantified as $D(f_1,f_2) = \mathbb{E}_{x\sim P}[(f_1(x)-f_2(x))^2]$ , where $f_1$ and $f_2$ are independent predictors. Anchoring introduces an average or reference model $\bar{f} = \frac12(f_1 + f_2)$ , yielding the Midpoint Identity:

$D(f_1,f_2) = 2 \bigl[ \mathrm{MSE}(f_1) + \mathrm{MSE}(f_2) - 2\,\mathrm{MSE}(\bar{f}) \bigr]$

If $\bar{f}$ resides in a hypothesis class $\mathcal{H}$ , this yields an Anchor Bound:

$D(f_1,f_2) \leq 2\bigl[\mathrm{MSE}(f_1) - R(\mathcal{H})\bigr] + 2\bigl[\mathrm{MSE}(f_2) - R(\mathcal{H})\bigr]$

where $R(\mathcal{H}) = \inf_{h\in\mathcal{H}}\mathrm{MSE}(h)$ . This technique generalizes to multi-dimensional or strongly convex loss settings with a $1/\mu$ factor, where $\mu$ is the strong convexity parameter. The anchoring framework can thus relate run-to-run variability to the optimization landscape and model class richness (Eaton et al., 26 Feb 2026).

2. Algorithmic Instantiations of Anchoring for Agreement

The anchoring methodology applies to a variety of algorithms:

Stacked Aggregation: Train $k$ models, aggregate outputs to form $f_1 = \arg\min_{\text{span}(G)} \mathrm{MSE}$ , and independently form $f_2$ from $G'$ . Disagreement is bounded by the stacked class $\text{span}(G \cup G')$ with $E[D(f_1,f_2)] \leq 4[\bar{R}_k-\bar{R}_{2k}]$ .
Gradient Boosting: For fixed weak-class $\mathcal{C}$ and $k$ rounds, disagreement decreases as $\mathcal{O}(1/k)$ . The anchor is the $k$ -stage average function, leading to $D(f_1,f_2) \leq 32(\tau^*)^2/k + \mathcal{O}(\varepsilon_t^2)$ .
Neural Networks (NNs): For NN classes with $n$ hidden units, the midpoint closure property ensures $f_1, f_2 \in NN_n \implies \bar{f} \in NN_{2n}$ , yielding $D(f_1,f_2) \leq 4[R(NN_n)-R(NN_{2n})+\varepsilon]$ .
Regression Trees: With maximum depth $d$ , midpoint closure supports similar anchoring bounds, leading to shrinking disagreement as tree depth increases (Eaton et al., 26 Feb 2026).

These bounds explain empirical observations that increasing ensemble size, model width, or iteration count both improves accuracy and enforces predictive stability.

3. Anchoring in Deep Representation Spaces

Disagreement in high-dimensional latent spaces is addressed by reference to pre-trained "anchor" models such as foundation encoders (e.g., CLIP, ViT). For a trained model $B$ , the latent representation $z=B(x)$ is compared to $h^i = H_i(x)$ across a pool of samples via a neighborhood-based agreement score. The approach is invariant to affine distortions and dimension mismatch because it evaluates the relative ordering (permutation) of neighbors in the latent space (Deng et al., 2023).

The pipeline consists of:

Extracting latent features $Z, H^{(i)}$ for a sample pool.
Computing $k$ -nearest neighbor rankings $\Pi^*, \Pi^i$ via cosine similarity.
Scoring agreement as $\mathrm{NDCG}(\Pi^*, \Pi^i, r)$ , where $r$ encodes neighborhood relevance.
Averaging across multiple anchors for robustness.

This agreement score predicts failure and reliability without requiring anchor model fine-tuning, and, when fused into softmax calibration (confidence scaling), substantially improves AUROC for failure detection across in-distribution and OOD regimes (Deng et al., 2023).

4. Anchoring in Alignment and Preference Optimization

Anchoring extends to the alignment of generative models and LLMs by incorporating explicit "anchor preference pairs" that exploit knowledge of the ground-truth or divide outputs into semantically stable categories. In self-explanation enhancement, preference sets are constructed by categorizing prompts as consistently correct (CC), variable (V), or consistently incorrect (CI), with category-specific pairing strategies. These pairs form data for direct preference optimization (DPO), compelling the LLM to maximize log-likelihood of high-quality, ground-truth-aligned explanations while minimizing it for weaker outputs (Villa-Arenas et al., 2024). The methodology involves:

Supervised fine-tuning on downstream tasks (without rationale supervision).
Generation and scoring of diverse predictions/explanations.
Anchor-based partitioning of outputs and formation of preference pairs.
Optimization under DPO with temperature $\beta$ on anchor pairs.

Empirically, this leads to models that maintain or enhance accuracy while generating higher-quality explanations, with performance gains scaling with the fraction of prompts in the V/CI buckets.

5. Dual-Path and Modulated Anchoring in Structured Generation

In deep sequence models with U-Net or diffusion backbones, anchoring can be multi-modal and modulated. The LUMA framework introduces dual-path anchoring, combining a temporal anchor (MoCLIP features trained via contrastive learning) and a frequency anchor (low-frequency DCT coefficients of the target motion) (Jia et al., 29 Sep 2025).

Both anchors are adaptively fused with FiLM-modulated scaling/offsets as a function of the diffusion timestep, allowing strong coarse-grained semantic regularization early and fine-grained temporal/frequency refinement later. This accelerates convergence and improves FID/Recall, with ablations confirming both anchors are essential. Limitations include the constraint of fixed DCT cutoff and the need for retraining MoCLIP per domain.

6. Anchoring Bias and Model Agreement in LLMs

Anchoring effects, traditionally conceptualized as cognitive biases in humans, manifest in LLMs as measurable shifts in generated output distributions in response to numeric or categorical anchor cues (Valencia-Clavijo, 7 Nov 2025). Model agreement is quantified by both behavioral (difference in soft expected value $\Delta EV$ ) and attributional (Shapley value for the anchor field) analyses, integrated into an Anchoring Bias Sensitivity Score (ABSS). Experiments confirm:

Robust anchoring effects (positive $\Delta EV$ and $\Delta\phi$ ) in large models (Gemma-2B, phi-2, Llama-2-7B).
Attributional fragility in small models (e.g., GPT-Neo-125M), suggesting possible misleading surface agreement.
ABSS combines strength, statistical significance, and concordance of behavioral and attributional signals.

The results indicate that anchoring in LLMs is internally driven by log-probability mass reweighting, not just output copycatting, and this has concrete implications for safety in domains where spurious cues may drive systematic errors.

7. Trade-offs, Practical Implications, and Open Questions

Anchoring provides theoretically grounded tools for controlling model disagreement and exploiting reference structures for reliability, alignment, and interpretability. Key considerations include:

Model parameter scaling (ensemble size, rounds, width) yields tighter agreement, explaining the empirical stability of large models.
Agreement bounds derived via anchoring can guide resource allocation—choosing $k$ , $n$ , or $d$ to balance accuracy and robustness.
Algorithmic extensions exist for alternative metrics (e.g., Jaccard, Spearman) and broader settings (multi-modal/multitask).
Limitations: Results are population-level; finite-sample corrections are not explicit; extension to non-convex or classification losses remains open (Eaton et al., 26 Feb 2026).

A plausible implication is that anchoring can be systematically adapted to diverse paradigms and modalities, but careful calibration, anchor construction, and understanding of internal model dynamics remain active research frontiers.

Markdown Report Issue Upgrade to Chat

References (5)

Model Agreement via Anchoring (2026)

Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement (2023)

Anchored Alignment for Self-Explanations Enhancement (2024)

LUMA: Low-Dimension Unified Motion Alignment with Dual-Path Anchoring for Text-to-Motion Diffusion Model (2025)

Anchors in the Machine: Behavioral and Attributional Evidence of Anchoring Bias in LLMs (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Model Agreement via Anchoring.

Model Agreement via Anchoring

1. Formalization of Model Disagreement and Anchoring

2. Algorithmic Instantiations of Anchoring for Agreement

3. Anchoring in Deep Representation Spaces

4. Anchoring in Alignment and Preference Optimization

5. Dual-Path and Modulated Anchoring in Structured Generation

6. Anchoring Bias and Model Agreement in LLMs

7. Trade-offs, Practical Implications, and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Model Agreement via Anchoring

1. Formalization of Model Disagreement and Anchoring

2. Algorithmic Instantiations of Anchoring for Agreement

3. Anchoring in Deep Representation Spaces

4. Anchoring in Alignment and Preference Optimization

5. Dual-Path and Modulated Anchoring in Structured Generation

6. Anchoring Bias and Model Agreement in LLMs

7. Trade-offs, Practical Implications, and Open Questions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research