Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dual-Encoder Contrastive Objectives

Updated 21 April 2026
  • Dual-encoder contrastive objectives are a framework that trains two parallel networks to encode inputs so that related pairs have closer embeddings while unrelated pairs are separated.
  • They are widely applied in cross-modal retrieval and recommendation systems, leveraging paired comparisons to improve semantic alignment.
  • Practical implementations include temperature scaling, momentum updates, and efficient negative sampling to optimize the balance between bias and variance in learning.

The term ABC-parametrization denotes two distinct, context-dependent statistical parameterization strategies that leverage auxiliary constraints for analytic or computational tractability. The first arises in latent variable model estimation, specifically in hidden Markov models (HMMs) with analytically intractable emission likelihoods; here, it refers to the introduction of approximate Bayesian computation (ABC) kernels and a pseudo-likelihood controlled by an ϵ\epsilon-parameter. The second usage appears in regression models with categorical covariates and interactions, where abundance-based constraints (ABC) define a reparametrization of categorical effects to enable efficient, interpretable, and equitable estimation. The following sections delineate the formal machinery, methodological workflow, theoretical guarantees, and empirical properties of both forms.

1. ABC-Parametrization in Hidden Markov Models

When the emission density gθ(ynxn)g_\theta(y_n|x_n) of an HMM cannot be evaluated in closed form but is simulable, an auxiliary-likelihood construction enables parameter estimation via ABC (Ehrlich et al., 2012). The standard HMM comprises a state space XRnX\subset\mathbb{R}^n, observation space YRmY\subset\mathbb{R}^m, and static parameter θΘRd\theta\in\Theta\subset\mathbb{R}^d. The emission process admits samples ugθ(xn)u\sim g_\theta(\cdot|x_n) for any xnx_n, yet the density gθ(ynxn)g_\theta(y_n|x_n) is not directly available.

The ABC-parametrization replaces the intractable likelihood component gθ(ynxn)g_\theta(y_n|x_n) in the joint smoothing density with the ABC surrogate:

gθ,ϵ(ykxk)1CϵKϵ(yku)gθ(uxk)du,g_{\theta,\epsilon}(y_k|x_k) \equiv \frac{1}{C_\epsilon} \int K_\epsilon(y_k|u) g_\theta(u|x_k) du,

where gθ(ynxn)g_\theta(y_n|x_n)0 is a kernel function (e.g., uniform, Gaussian) centered at gθ(ynxn)g_\theta(y_n|x_n)1 with tolerance gθ(ynxn)g_\theta(y_n|x_n)2, and gθ(ynxn)g_\theta(y_n|x_n)3 normalizes the perturbation. The overall ABC-approximated marginal likelihood is:

gθ(ynxn)g_\theta(y_n|x_n)4

where the one-step predictive densities marginalize over the intractable emission via kernel-weighted simulation.

The key theoretical result is an gθ(ynxn)g_\theta(y_n|x_n)5 upper bound on the log-likelihood and gradient bias between the true and ABC marginal likelihood, assuming Lipschitz continuity and boundedness conditions for transition and emission densities and their parameter gradients. This ensures that, for moderate gθ(ynxn)g_\theta(y_n|x_n)6 and gθ(ynxn)g_\theta(y_n|x_n)7, ABC-induced error remains computationally and statistically negligible relative to particle filter Monte Carlo error.

2. Particle Filter Implementation and Parameter Estimation

Efficient computation under the ABC-parameterization is achieved via a sequential Monte Carlo (SMC) particle filter using gθ(ynxn)g_\theta(y_n|x_n)8 particles and pseudo-observation draws:

  1. Initialization: gθ(ynxn)g_\theta(y_n|x_n)9, weights XRnX\subset\mathbb{R}^n0.
  2. Resampling: If the effective sample size of weights is low, resample ancestors.
  3. Propagation: Propose XRnX\subset\mathbb{R}^n1 and sample XRnX\subset\mathbb{R}^n2.
  4. Weighting: Compute XRnX\subset\mathbb{R}^n3 and normalize.
  5. Marginal-Likelihood Estimation: The estimated marginal contribution is XRnX\subset\mathbb{R}^n4; the overall marginal likelihood estimate is the product XRnX\subset\mathbb{R}^n5. A second-order bias correction may be applied to the log-likelihood.

Parameter updates are performed online using simultaneous perturbation stochastic approximation (SPSA). Two SMC filters are run at XRnX\subset\mathbb{R}^n6 and XRnX\subset\mathbb{R}^n7, where XRnX\subset\mathbb{R}^n8 is a vector of independent Rademacher random variables. The gradient estimate for component XRnX\subset\mathbb{R}^n9 is YRmY\subset\mathbb{R}^m0; parameter updates proceed as YRmY\subset\mathbb{R}^m1 with suitable diminishing step sizes YRmY\subset\mathbb{R}^m2.

3. Bias–Variance Trade-offs and Numerical Properties

Empirical studies demonstrate fundamental tradeoffs in the ABC-parameter YRmY\subset\mathbb{R}^m3 and the Monte Carlo sample size YRmY\subset\mathbb{R}^m4, as well as pseudo-observation replicate number YRmY\subset\mathbb{R}^m5:

  • Bias in the marginal likelihood and parameter gradients is bounded by YRmY\subset\mathbb{R}^m6; variance increases as YRmY\subset\mathbb{R}^m7 due to weight degeneracy.
  • For fixed YRmY\subset\mathbb{R}^m8, increasing YRmY\subset\mathbb{R}^m9 stabilizes particle weights but increases estimator bias; reducing θΘRd\theta\in\Theta\subset\mathbb{R}^d0 shrinks bias but amplifies variance.
  • Variance of estimates typically scales as θΘRd\theta\in\Theta\subset\mathbb{R}^d1, with improvements for larger θΘRd\theta\in\Theta\subset\mathbb{R}^d2.
  • In practical scenarios (e.g., Lorenz '63 model), empirically optimal θΘRd\theta\in\Theta\subset\mathbb{R}^d3 is suggested to balance bias and variance, with bias nearly linear in θΘRd\theta\in\Theta\subset\mathbb{R}^d4 and variance inversely proportional.

A summary of empirical findings:

Setting Bias Behavior Variance Behavior Notes
θΘRd\theta\in\Theta\subset\mathbb{R}^d5 θΘRd\theta\in\Theta\subset\mathbb{R}^d6 θΘRd\theta\in\Theta\subset\mathbb{R}^d7 Bias–variance trade-off, “sweet-spot” for midrange θΘRd\theta\in\Theta\subset\mathbb{R}^d8
θΘRd\theta\in\Theta\subset\mathbb{R}^d9 (particles) Const. bias ugθ(xn)u\sim g_\theta(\cdot|x_n)0 Increasing ugθ(xn)u\sim g_\theta(\cdot|x_n)1 reduces estimator variance
ugθ(xn)u\sim g_\theta(\cdot|x_n)2 (replicates) Stable bias for ugθ(xn)u\sim g_\theta(\cdot|x_n)3 Var. decreases Redundant samples reduce Monte Carlo error

4. ABC-Parametrization for Regression with Categorical Interactions

The abundance-based constraints (ABC) parametrization for categorical-modified regression models addresses challenges inherent in traditional codings (e.g., reference-group or sum-to-zero constraints) when modeling main and interaction effects of categorical covariates (Kowal, 2024).

Given data ugθ(xn)u\sim g_\theta(\cdot|x_n)4 with categorical variables ugθ(xn)u\sim g_\theta(\cdot|x_n)5 of ugθ(xn)u\sim g_\theta(\cdot|x_n)6 levels, the cat-modified linear model includes main effects, categorical–continuous, and categorical–categorical interactions:

ugθ(xn)u\sim g_\theta(\cdot|x_n)7

ABC constraints impose that category-level effects are centered by their empirical proportions:

ugθ(xn)u\sim g_\theta(\cdot|x_n)8

and for categorical–categorical interactions,

ugθ(xn)u\sim g_\theta(\cdot|x_n)9

where xnx_n0 is the empirical proportion of group xnx_n1.

5. Estimation Invariance, Power, and Interpretation Advantages

Main effect estimates—continuous slopes and categorical effects—are preserved under ABCs, even when categorical modifiers (interactions) are added. Under equal variance (or covariance) of covariates within groups, analytic results guarantee:

  • Invariance: Estimators for intercept and main effects are identical across models with and without interactions (e.g., in ANCOVA and two-way ANOVA).
  • Efficient Standard Errors: Addition of interaction terms under ABCs does not inflate, and often reduces, the standard errors (SEs) of main effect estimates. This reflects reduction in the model residual sum-of-squares xnx_n2.
  • Interpretability: Main effects under ABC parametrization coincide with abundance-weighted population or group averages; interaction coefficients represent group deviations.

Contrast with traditional codings:

Coding Type Main Effect Interpretation SE Behavior on Interaction Inclusion Reference Group Bias
Reference-Group Effect in reference group SEs may inflate/change Yes
Sum-to-Zero Average effect (unweighted) No special invariance No
ABC (abundance) Abundance-weighted group averages SEs never increase; often decrease No

6. Implementation, Theoretical Conditions, and Examples

Implementation of ABC regression requires:

  • Construction of the full overparametrized design matrix,
  • Computation of categorical and joint categorical proportions,
  • Formation of the constraint matrix xnx_n3,
  • QR decomposition to obtain a reduced-basis xnx_n4 for reparametrization,
  • Solving the unconstrained regression in the lower-dimensional space.

Theoretical validity depends on centering of covariates and, for strongest invariance results, on homogeneity of group covariance matrices for covariates. Empirical studies show that near-invariance holds under mild deviations from these conditions.

Illustrative examples clarify the behavior of estimates and SEs in main-only versus cat-modified models, as well as confirm empirically that ABC-based estimates remain unaltered while SEs do not increase, in contrast to non-ABC encodings.

ABC-parametrization thus provides consistent, interpretable, and equitable estimation procedures in both intractable latent variable models (via kernel-ABC approximation) and categorical regression settings (via data-driven centering constraints), underpinned by rigorous theoretical safeguards and verified by simulation and real data analysis (Ehrlich et al., 2012, Kowal, 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual-Encoder Contrastive Objectives.