Papers
Topics
Authors
Recent
Search
2000 character limit reached

Model Agreement via Anchoring

Updated 4 March 2026
  • The framework provides a formal definition of model disagreement and introduces anchoring to reduce output divergence between ML models.
  • It details algorithmic instantiations across regression, neural networks, and diffusion models, revealing quantifiable bounds and improved reliability.
  • The approach extends to high-dimensional latent spaces and LLMs by using anchors to expose biases, enhance alignment, and calibrate model confidence.

Model Agreement via Anchoring is an analytical and practical framework for reducing or exploiting model disagreement—defined as the expected divergence in outputs between independently trained machine learning models—through the introduction of explicit anchor signals. Anchored techniques have been systematically developed to bound disagreement, improve model reliability, enhance alignment, and expose underlying biases across classical predictors, neural networks, diffusion architectures, and LLMs.

1. Formalization of Model Disagreement and Anchoring

In regression settings, model disagreement is quantified as D(f1,f2)=ExP[(f1(x)f2(x))2]D(f_1,f_2) = \mathbb{E}_{x\sim P}[(f_1(x)-f_2(x))^2], where f1f_1 and f2f_2 are independent predictors. Anchoring introduces an average or reference model fˉ=12(f1+f2)\bar{f} = \frac12(f_1 + f_2), yielding the Midpoint Identity:

D(f1,f2)=2[MSE(f1)+MSE(f2)2MSE(fˉ)]D(f_1,f_2) = 2 \bigl[ \mathrm{MSE}(f_1) + \mathrm{MSE}(f_2) - 2\,\mathrm{MSE}(\bar{f}) \bigr]

If fˉ\bar{f} resides in a hypothesis class H\mathcal{H}, this yields an Anchor Bound:

D(f1,f2)2[MSE(f1)R(H)]+2[MSE(f2)R(H)]D(f_1,f_2) \leq 2\bigl[\mathrm{MSE}(f_1) - R(\mathcal{H})\bigr] + 2\bigl[\mathrm{MSE}(f_2) - R(\mathcal{H})\bigr]

where R(H)=infhHMSE(h)R(\mathcal{H}) = \inf_{h\in\mathcal{H}}\mathrm{MSE}(h). This technique generalizes to multi-dimensional or strongly convex loss settings with a 1/μ1/\mu factor, where μ\mu is the strong convexity parameter. The anchoring framework can thus relate run-to-run variability to the optimization landscape and model class richness (Eaton et al., 26 Feb 2026).

2. Algorithmic Instantiations of Anchoring for Agreement

The anchoring methodology applies to a variety of algorithms:

  • Stacked Aggregation: Train kk models, aggregate outputs to form f1=argminspan(G)MSEf_1 = \arg\min_{\text{span}(G)} \mathrm{MSE}, and independently form f2f_2 from GG'. Disagreement is bounded by the stacked class span(GG)\text{span}(G \cup G') with E[D(f1,f2)]4[RˉkRˉ2k]E[D(f_1,f_2)] \leq 4[\bar{R}_k-\bar{R}_{2k}].
  • Gradient Boosting: For fixed weak-class C\mathcal{C} and kk rounds, disagreement decreases as O(1/k)\mathcal{O}(1/k). The anchor is the kk-stage average function, leading to D(f1,f2)32(τ)2/k+O(εt2)D(f_1,f_2) \leq 32(\tau^*)^2/k + \mathcal{O}(\varepsilon_t^2).
  • Neural Networks (NNs): For NN classes with nn hidden units, the midpoint closure property ensures f1,f2NNn    fˉNN2nf_1, f_2 \in NN_n \implies \bar{f} \in NN_{2n}, yielding D(f1,f2)4[R(NNn)R(NN2n)+ε]D(f_1,f_2) \leq 4[R(NN_n)-R(NN_{2n})+\varepsilon].
  • Regression Trees: With maximum depth dd, midpoint closure supports similar anchoring bounds, leading to shrinking disagreement as tree depth increases (Eaton et al., 26 Feb 2026).

These bounds explain empirical observations that increasing ensemble size, model width, or iteration count both improves accuracy and enforces predictive stability.

3. Anchoring in Deep Representation Spaces

Disagreement in high-dimensional latent spaces is addressed by reference to pre-trained "anchor" models such as foundation encoders (e.g., CLIP, ViT). For a trained model BB, the latent representation z=B(x)z=B(x) is compared to hi=Hi(x)h^i = H_i(x) across a pool of samples via a neighborhood-based agreement score. The approach is invariant to affine distortions and dimension mismatch because it evaluates the relative ordering (permutation) of neighbors in the latent space (Deng et al., 2023).

The pipeline consists of:

  • Extracting latent features Z,H(i)Z, H^{(i)} for a sample pool.
  • Computing kk-nearest neighbor rankings Π,Πi\Pi^*, \Pi^i via cosine similarity.
  • Scoring agreement as NDCG(Π,Πi,r)\mathrm{NDCG}(\Pi^*, \Pi^i, r), where rr encodes neighborhood relevance.
  • Averaging across multiple anchors for robustness.

This agreement score predicts failure and reliability without requiring anchor model fine-tuning, and, when fused into softmax calibration (confidence scaling), substantially improves AUROC for failure detection across in-distribution and OOD regimes (Deng et al., 2023).

4. Anchoring in Alignment and Preference Optimization

Anchoring extends to the alignment of generative models and LLMs by incorporating explicit "anchor preference pairs" that exploit knowledge of the ground-truth or divide outputs into semantically stable categories. In self-explanation enhancement, preference sets are constructed by categorizing prompts as consistently correct (CC), variable (V), or consistently incorrect (CI), with category-specific pairing strategies. These pairs form data for direct preference optimization (DPO), compelling the LLM to maximize log-likelihood of high-quality, ground-truth-aligned explanations while minimizing it for weaker outputs (Villa-Arenas et al., 2024). The methodology involves:

  • Supervised fine-tuning on downstream tasks (without rationale supervision).
  • Generation and scoring of diverse predictions/explanations.
  • Anchor-based partitioning of outputs and formation of preference pairs.
  • Optimization under DPO with temperature β\beta on anchor pairs.

Empirically, this leads to models that maintain or enhance accuracy while generating higher-quality explanations, with performance gains scaling with the fraction of prompts in the V/CI buckets.

5. Dual-Path and Modulated Anchoring in Structured Generation

In deep sequence models with U-Net or diffusion backbones, anchoring can be multi-modal and modulated. The LUMA framework introduces dual-path anchoring, combining a temporal anchor (MoCLIP features trained via contrastive learning) and a frequency anchor (low-frequency DCT coefficients of the target motion) (Jia et al., 29 Sep 2025).

Both anchors are adaptively fused with FiLM-modulated scaling/offsets as a function of the diffusion timestep, allowing strong coarse-grained semantic regularization early and fine-grained temporal/frequency refinement later. This accelerates convergence and improves FID/Recall, with ablations confirming both anchors are essential. Limitations include the constraint of fixed DCT cutoff and the need for retraining MoCLIP per domain.

6. Anchoring Bias and Model Agreement in LLMs

Anchoring effects, traditionally conceptualized as cognitive biases in humans, manifest in LLMs as measurable shifts in generated output distributions in response to numeric or categorical anchor cues (Valencia-Clavijo, 7 Nov 2025). Model agreement is quantified by both behavioral (difference in soft expected value ΔEV\Delta EV) and attributional (Shapley value for the anchor field) analyses, integrated into an Anchoring Bias Sensitivity Score (ABSS). Experiments confirm:

  • Robust anchoring effects (positive ΔEV\Delta EV and Δϕ\Delta\phi) in large models (Gemma-2B, phi-2, Llama-2-7B).
  • Attributional fragility in small models (e.g., GPT-Neo-125M), suggesting possible misleading surface agreement.
  • ABSS combines strength, statistical significance, and concordance of behavioral and attributional signals.

The results indicate that anchoring in LLMs is internally driven by log-probability mass reweighting, not just output copycatting, and this has concrete implications for safety in domains where spurious cues may drive systematic errors.

7. Trade-offs, Practical Implications, and Open Questions

Anchoring provides theoretically grounded tools for controlling model disagreement and exploiting reference structures for reliability, alignment, and interpretability. Key considerations include:

  • Model parameter scaling (ensemble size, rounds, width) yields tighter agreement, explaining the empirical stability of large models.
  • Agreement bounds derived via anchoring can guide resource allocation—choosing kk, nn, or dd to balance accuracy and robustness.
  • Algorithmic extensions exist for alternative metrics (e.g., Jaccard, Spearman) and broader settings (multi-modal/multitask).
  • Limitations: Results are population-level; finite-sample corrections are not explicit; extension to non-convex or classification losses remains open (Eaton et al., 26 Feb 2026).

A plausible implication is that anchoring can be systematically adapted to diverse paradigms and modalities, but careful calibration, anchor construction, and understanding of internal model dynamics remain active research frontiers.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Model Agreement via Anchoring.