Model Proxies in ML and Inference

Updated 4 December 2025

Model proxies are surrogate models that approximate target functions, properties, or latent variables for efficiency and interpretability.
They are implemented via methods like deep metric learning, regression, and neural networks to reduce computational cost and enhance scalability.
Their applications span federated learning, fairness auditing, causal inference, and scientific modeling, demonstrating versatile practical impact.

A model proxy is a surrogate statistical or computational construct that approximates a target function, class, property, or latent variable, often for reasons of computational efficiency, tractability, privacy, or interpretability. The concept pervades machine learning, causal inference, federated optimization, fairness auditing, and scientific modeling, with instantiations that range from learned prototype vectors in metric learning to noisy classifiers for sensitive attribute inference. Model proxies can take the form of simple regression models, neural network predictors, clustering prototypes, or index-specific surrogates, and frequently serve as stand-ins for more expensive, complex, inaccessible, or latent processes.

1. Proxy-Based Deep Metric Learning

In deep metric learning, proxies are compact, learnable class representatives in the embedding space, designed to replace expensive pairwise or triplet computations with single comparisons to class prototypes. For example, ProxyNCA++ (Teh et al., 2020) associates each class $c$ with a unique proxy vector $p_c \in \mathbb{R}^D$ , learned as a parameter and constrained post-optimization to the unit hypersphere. During training, embeddings $x_i$ are “pulled” towards their class proxy and “pushed” from others:

$L_{\mathrm{ProxyNCA++}}(x_i) = -\log \frac{\exp\left(-\|x_i - p_{c_i}\|^2 / T\right)}{\sum_{a=1}^K \exp\left(-\|x_i - p_{a}\|^2 / T\right)}$

with low temperature $T$ producing peaked assignment probabilities. Fast-moving proxies employ an elevated learning rate to match the evolving embedding space, and pooling choices (GMP vs. GAP) further impact alignment and discriminativity. ProxyNCA++ demonstrates a $22.9$ percentage-point improvement in Recall@1 across four retrieval datasets, with state-of-the-art class separation and scalability (Teh et al., 2020). Similarly, Soft Orthogonal Proxies (Saberi-Movahed et al., 2023) introduce a regularization term favoring orthogonal proxy placement:

$R_{SO} = \|P^\top P - I_r\|_F^2$

minimizing redundancy and enhancing inter-class discrimination.

2. Proxies for Fast Approximate Set Selection

In large-scale retrieval and filtering tasks, proxy models are cheap classifiers or regressors that subsample or score records for more costly downstream evaluation by an oracle (human or high-end DNN). Importance-sampling frameworks such as SUPG (Kang et al., 2020) employ these proxies as scoring functions $A(x) \in [0,1]$ , targeting hard guarantees for recall or precision:

$\Pr[\mathrm{Recall}(\mathrm{res}) \geq \tau_r] \geq 1 - \delta$

with variance-minimizing sample weights $w(x) \propto \sqrt{A(x)}$ . This approach yields up to $30\times$ reduction in labeling costs or result sizes compared to uniform sampling, and provides asymptotic confidence intervals for query returns. Rigorous calibration and conditional independence assumptions are critical, especially when proxies are noisy, adversarial, or correlated (Kang et al., 2020).

3. Proxy Use and Detection in Model Fairness and Discrimination

A “proxy variable” or proxy model in fairness auditing refers to an input, intermediate variable, or component correlated with a protected attribute and influential in the model’s predictions. In linear regression (Yeom et al., 2018), proxy detection is formalized as identifying $P = \sum \alpha_i \beta_i X_i$ with both high association $Asc(P,Z)$ and influence $Infl_f(P)$ :

$Asc(P, Z) = \frac{\mathrm{Cov}(P, Z)^2}{\mathrm{Var}(P)\mathrm{Var}(Z)}, \quad Infl_f(P) = \frac{\mathrm{Var}(P)}{\mathrm{Var}(f(X))}$

Proxy detection is formulated as a second-order cone program (SOCP) with constraints on association and influence, extendable to “business necessity” legally exempt features. Proxies are crucial in explaining, mitigating, or regulating discriminatory behavior, and empirical studies confirm their prevalence and tractability (Yeom et al., 2018). In fairness estimation with inaccessible sensitive attributes, weak proxies (with merely better-than-random accuracy) suffice, provided they are informative and conditionally independent. Calibrated procedures with three proxies enable provably accurate fairness estimation, even as proxy quality degrades (Zhu et al., 2022).

4. Model Proxies in Causal Inference and Identification

Proxies function as critical identification tools in causal inference under unmeasured confounding. Proxy controls (Deaner, 2018) are observed variables informative about latent confounders $W$ , partitioned into outcome-aligned ( $V$ ) and treatment-aligned ( $Z$ ) proxies. Identification leverages conditional independence and completeness assumptions, employing operator equations (Fredholm) and nonparametric bridge functions to recover average potential outcomes. In panels, time-lagged outcomes and future treatments serve as proxies for unobserved heterogeneity. Proximal frameworks, further extended to the adaptive selection of valid among many potentially invalid proxies (Rakshit et al., 25 Jul 2025), solve for causal effects via penalized regression and the median trick, attaining root- $n$ consistency and robust confidence intervals.

5. Model Proxies for Efficient ML Inference and Query Acceleration

Proxy models routinely accelerate pipelines in database-integrated ML systems, acting as lightweight filters preceding costly UDFs. Probabilistic predicates (PP) (Yang et al., 2022) and correlative proxy optimizers (CORE) build lightweight classifiers for each predicate, online or offline, and optimize ordering and accuracy assignments for minimal expected cost:

$C_\mathrm{CORE} = \sum_{i=1}^n s_i\,c_{\hat o_i} + s_i(1 - r_i)\,c_{F_i}$

Allowing for predicate correlations in proxy construction dramatically improves throughput (up to 80\%) over baseline and prior independence-based approaches, confirmed via branch-and-bound search and empirical paper. Strategic injection of proxies in query rewrite rules ensures global accuracy targets and optimal filtering.

6. Proxy Models in Federated Learning and Large-Scale Adaptation

Proxies are increasingly integral in resource-constrained federated learning, privacy preservation, and decentralization. In ProxyFL (Kalra et al., 2021), each client maintains a private model and a public proxy model, the latter trained under differential privacy via DP-SGD and mutual learning objectives:

$L_k^{\mathrm{proxy}}(\phi_k; \theta_k) = (1-\beta) L_{CE}(h_k) + \beta L_{KL}(h_k \| f_k)$

Only proxies are exchanged via gossip protocols, enabling heterogeneity in private models and per-client $(\varepsilon, \delta)$ -DP guarantees. In FedPromo (Caligiuri et al., 5 Aug 2025), server-side knowledge distillation first aligns a small CNN proxy to a large foundation model, then federated client classifiers train atop the frozen proxy, with novel regularization (inactive class preservation, class de-biasing) to combat non-IID data and semantic drift. Aggregation and plugging classifier heads back onto the foundation model yield domain-adapted, privacy-preserving models with minimal client computation.

7. Proxy Strategies in LLMs and Knowledge Mining

Recent innovations extend proxies to adapt and steer LLMs with black-box weights via proxy-tuning (Liu et al., 16 Jan 2024): the predictions of a small, tuned LM adjust the outputs of a large, untuned model at inference,

$p_t^{\mathrm{final}}(w) = p_t^{\mathrm{large, base}}(w) + \lambda \big(p_t^{\mathrm{small, tuned}}(w) - p_t^{\mathrm{small, base}}(w)\big)$

This closes most of the behavioral/tuning gap at a fraction of computation, and enables task/domain/temporal adaptation even for proprietary models. In knowledge mining (Falconer (Zhang et al., 1 Oct 2025)), LLMs act as planners and annotators, producing minimal, task-unified proxy models for sequence labeling and classification (get_label, get_span) that achieve near-LLM accuracy at up to $20\times$ speed-up and $90\%$ cost reduction.

8. Proxies for Model Explanation, Topic Evaluation, and Scientific Reconstruction

ExplainReduce (Seppäläinen et al., 14 Feb 2025) summarizes local explanations (e.g., LIME/SHAP) by greedily selecting a minimal proxy set that maximizes coverage or minimizes loss over the full data:

$S^* = \arg\max_{|S| \leq k} C(S, \varepsilon) \quad \text{or} \quad \arg\min_{|S| \leq k} F(S)$

This produces a global surrogate explanation with interpretable clusters matching the closed-box behavior. ProxAnn (Hoyle et al., 1 Jul 2025) validates the use of LLM-based proxies for human-aligned topic modeling evaluation, where proxies are prompted to generate labels, fit ratings, and rankings, demonstrating indistinguishable statistical agreement with crowdworkers in both fit and representativeness tasks.

In hierarchical scientific modeling (hydroclimate reconstruction (Cahill et al., 2022)), multiple proxies (e.g., tree rings, speleothems) are hierarchically fused via Bayesian inference to reconstruct latent climate indices over time, employing AR(1) stochastic process priors and sparse calibration.

References

ProxyNCA++ (Teh et al., 2020)
Approximate Selection with Guarantees (Kang et al., 2020)
Discriminatory Proxies in Linear Regression (Yeom et al., 2018)
Proxy Controls for Causal Panel Data (Deaner, 2018)
Deep Metric Learning with Soft Orthogonal Proxies (Saberi-Movahed et al., 2023)
CORE: Correlative Proxy Models in ML Queries (Yang et al., 2022)
Bayesian Hydroclimate Reconstruction with Proxies (Cahill et al., 2022)
ProxAnn: Topic Model Evaluation via LLM Proxies (Hoyle et al., 1 Jul 2025)
ExplainReduce: Proxy-Based Model Summarization (Seppäläinen et al., 14 Feb 2025)
ProxyFL: Federated Proxy Model Sharing (Kalra et al., 2021)
FedPromo: Federated Lightweight Proxy Models (Caligiuri et al., 5 Aug 2025)
EMQ: Evolving Training-Free Proxies for Quantization (Dong et al., 2023)
Proxy-Based Fairness Evaluation (Zhu et al., 2022)
Location Tests with Noisy Proxies (Deutsch et al., 25 Jul 2025)
Adaptive Proximal Causal Inference (Rakshit et al., 25 Jul 2025)
Tuning LLMs by Proxy (Liu et al., 16 Jan 2024)
Falconer: Knowledge Mining with Proxy Agents (Zhang et al., 1 Oct 2025)