Reparametrization Methods in ML & Statistics

Updated 29 September 2025

Reparametrization-based methods are techniques that reformulate models by changing variables to simplify computation, improve gradient estimation, and enforce invariance.
They enable low-variance, unbiased gradient estimators in variational inference and Bayesian learning by transforming complex distributions into tractable forms.
Applications span semi-parametric inference, differential privacy, and physics-informed networks, providing enhanced computational efficiency and robust performance.

Reparametrization-based methods refer to a broad class of techniques across statistics, optimization, probabilistic modeling, and deep learning in which a problem is reformulated by changing variables (“reparametrizing”)—either analytically or algorithmically—to simplify computation, enhance statistical efficiency, or enforce structural invariance. In contemporary machine learning, the term is most commonly associated with employing differentiable mappings (e.g., the reparameterization trick) to facilitate gradient-based optimization, as well as with exploiting the geometry or symmetries of parametrized models to improve inference and learning. The following sections detail the core principles, methodological developments, computational and statistical implications, and application domains of reparametrization-based approaches, tracing rigorously documented results and techniques from the arXiv literature.

1. Core Principles and Mathematical Foundations

Reparametrization involves expressing a statistical or optimization model in terms of alternative variables to either absorb nuisance structure, expose efficient directions for learning, or ensure invariance under changes of representation. In a general probabilistic model, this can involve introducing a transformation $z = h(\varepsilon, \theta)$ such that sampling from a complex distribution over $z$ becomes equivalent to deterministic mapping of a simpler (often standard) auxiliary variable $\varepsilon$ combined with differentiable parameter dependencies.

In the context of semi-parametric inference, the approach is typified by expressing the density as $p_s(x;\beta,\eta)$ for $s=1,\dots,S$ , where $\beta$ is the parameter of interest and $\eta$ is a nuisance parameter, possibly infinite-dimensional. A least favorable submodel is constructed by profiling out $\eta$ as $\hat{\alpha}_\beta$ , and, under suitable conditions, this submodel can be reparametrized further into finite-dimensional coordinates $q_\beta$ , yielding $p_s^*(x;\beta,q_\beta)$ (Hirose et al., 2012). This mapping reduces computational complexity by projecting infinite-dimensional spaces to tractable finite ones and enables the explicit calculation of efficient scores and information matrices.

In variational inference, reparameterization by differentiable transformations enables low-variance, unbiased gradient estimators for the evidence lower bound (ELBO). The classic reparameterization trick for Gaussian posteriors (e.g., $z = \mu + \sigma \varepsilon$ , $\varepsilon \sim \mathcal{N}(0,1)$ ) is generalized to accept broader distribution families, as with acceptance-rejection reparameterization (Naesseth et al., 2016) or normalizing flows on Lie groups (Falorsi et al., 2019).

Reparametrization also extends to model-based transformations: learning invertible mappings $z = g(\varepsilon; \lambda)$ , where $\lambda$ are reparametrization parameters, can "warp" the posterior into a form more favorable for MCMC convergence or for implicit variational objectives (Titsias, 2017).

Mathematically, the change of variables formula and the structure of Jacobian determinants play a central role in all these works. For a mapping $T: z \to \omega$ and prior $q(z)$ , the target density is induced via $p(\omega) = q(z) | \det J_T(z) |^{-1}$ . Optimization objectives or gradient estimators are accordingly corrected with log-determinant Jacobian terms, crucial in e.g. BRDF importance sampling (Wu et al., 13 May 2025), normalizing flows, and reparametrization gradients.

2. Efficiency and Invariance in Statistical Inference

Reparametrization-based methods provide both statistical and computational efficiency in complex models, especially those featuring nuisance parameters or symmetries:

In semi-parametric multisample models, finite-dimensional reparametrization of the least favorable submodel allows the calculation of efficient scores and information in closed form under moment and normalization constraints. For the parameter of interest $\beta$ , the efficient score is expressed (with centered scores $\dot{s}_1^c, \dot{s}_2^c$ for $\beta$ and $q$ ) as

$\dot{s}^*(s, x) = \dot{s}_1^c(s, x; \beta_0, q_{\beta_0}) - \Big\{ \sum_s w_s \mathbb{E}_{s,0}(\dot{s}_1^c \dot{s}_2^{c\top}) \Big\} \Big\{ \sum_s w_s \mathbb{E}_{s,0}(\dot{s}_2^c \dot{s}_2^{c\top}) \Big\}^{-1} \dot{s}_2^c(s, x; \beta_0, q_{\beta_0})$

with the associated efficient information matrix analogously expressed (Hirose et al., 2012). When regularity and non-singularity conditions are fulfilled, estimators for $\beta$ achieve the semi-parametric efficiency bound.

In Bayesian neural networks and approximate inference, ensuring that posteriors and predictives are invariant under reparametrization (i.e., two different parameter vectors $w, w'$ yielding the same function assignment) is essential for valid uncertainty quantification. Standard Laplace approximations or variational posteriors often lack this invariance. The geometry of reparametrization classes can be characterized via the kernel (nullspace) of the generalized Gauss-Newton (GGN) matrix, which captures directions in parameter space that do not affect the model output on data,

$\mathcal{P} = \mathbb{R}^D/\sim, \qquad \text{where}\quad w \sim w' \iff f(w, x_n) = f(w', x_n)\;\forall n$

(Roy et al., 5 Jun 2024). Linearized Laplace approximations and Riemannian diffusion processes that respect the pullback metric defined by the data Jacobian and likelihood Hessian yield approximate posteriors that are invariant to the underlying parameterization.

In optimization, model reparametrization can dramatically reshape the landscape. For instance, neural reparametrization on sparse graphs, where optimization variables are predicted by a GNN whose weights are optimized, can effectively implement quasi-Newton-like updates without the computational overhead of large-scale Hessian inversion, achieving substantial speedups (Dehmamy et al., 2022).

3. Methodological Advances in Reparametrization for Inference and Optimization

Several classes of methodological development have emerged:

a. Variational Inference and Probabilistic Modeling

Acceptance-Rejection Reparametrization: For distributions such as gamma and Dirichlet (not amenable to standard reparameterization), the AR step is integrated out analytically to yield a smooth term that allows unbiased, low-variance gradient estimation; the expectation over the target is rewritten as one over the proposal's transformed space, with a correction for the acceptance ratio (Naesseth et al., 2016).
Implicit Variational Inference via Model Reparametrization: The model is reparametrized so that MCMC kernels operate in a transformed latent space, optimizing ELBOs over the reparametrization parameters while using the reparameterization trick for gradient flows (Titsias, 2017).
Permutation Modeling: Reparametrizing relaxations of the permutation matrix space using stick-breaking or rounding maps allows optimization in the Birkhoff polytope, with reparameterization gradients enabling efficient variational inference over high-dimensional discrete structures (Linderman et al., 2017).
Reparametrized Gradients for Kalman Filtering: Nonlinear Kalman filters are developed where the optimization of an energy function (rather than an alpha-divergence directly) is performed with reparametrization gradients, leveraging the Cholesky-decomposed transformation to backpropagate through the Gaussian parameterization (Gultekin et al., 2023).

b. Reparametrization in Optimization and Learning

Low-rank Gradient Reparametrization for Differential Privacy: Rearrangement of per-example SGD gradients into low-rank factors (gradient-carrier and residual matrices) allows the addition of noise in a low-dimensional space, reducing both computational and privacy costs with strong empirical results on large models (Yu et al., 2021).
Budget-aware Network Pruning: Implicit reparametrization of weights using continuous, differentiable mask functions allows simultaneous training and pruning without explicit fine-tuning stages, with a budget constraint for sparsity imposed in the objective (Dupont et al., 2021).
Neural Reparametrization for Sparse Graph Optimization: Parameterizing optimization variables through a GNN whose topology is dictated by precomputed Hessians or Laplacians, achieving fast convergence by embedding approximate second-order curvature information into the architectural design (Dehmamy et al., 2022).

c. Reparametrization in Physics-Informed Neural Networks

Boundary Condition Enforcement: Introducing multiplicative factors $B(x)$ to the neural network output ensures that boundary conditions are satisfied by construction, avoiding penalization terms in the loss, and yielding improved stability and reduced approximation error in challenging PDE scenarios (Nand et al., 2023).

d. Geometry and Invariance

Riemannian Geometry of Neural Network Parameter Spaces: Explicit representation and transformation of the metric allow all quantities such as flatness (Hessian spectrum), optimization dynamics, and posterior density to be made invariant under reparametrizations. The Hessian transforms as a (0,2)-tensor; the volume element for probability densities is corrected by the square root of the metric determinant (yielding, e.g., the Jeffreys prior as the natural uninformative prior under the Riemannian geometry) (Kristiadi et al., 2023).

4. Computational Benefits and Variance Reduction

A central claim of reparametrization-based methods is the improvement in computational efficiency and gradient variance reduction:

Low-variance Gradient Estimators: Integrating out auxiliary variables via reparametrization, as in AR-based methods (Naesseth et al., 2016), or using the local reparameterization trick (which is a special case of Rao-Blackwellised reparameterization) in neural networks (Lam et al., 9 Jun 2025), results in unbiased gradient estimates with provably lower variance. Specifically, conditioning the reparameterization gradient on intermediate (linear) outputs—e.g., pre-activations—implements Rao-Blackwellisation and is shown to recover or improve on the empirical gains of the local reparameterization trick for Bayesian MLPs and VAEs.
Accelerated Training and Inference: In differentially private learning, low-rank reparametrization allows noise to be concentrated in the informative subspace of the gradients, improving both accuracy and privacy guarantees (Yu et al., 2021). In high-dimensional rendering, reparametrization-based neural BRDF samplers can match target distributions via flexible, non-invertible neural mappings, yielding both lower variance in Monte Carlo estimates and higher speed compared to normalizing flows or multi-step samplers (Wu et al., 13 May 2025).

5. Applications Across Domains

Reparametrization-based techniques have been systematically deployed in multiple research areas:

Domain	Reparametrization Approach	Reference
Variational inference in semi-parametric models	Finite-dimensional nuisance parameterization via $q_\beta$	(Hirose et al., 2012)
Complex variational families (gamma, Dirichlet)	Acceptance-rejection reparametrization and marginalization	(Naesseth et al., 2016)
Implicit variational inference	Model-based invertible transformation for MCMC/VI	(Titsias, 2017)
Inference over permutations	Stick-breaking, rounding maps over Birkhoff polytope	(Linderman et al., 2017)
Geometry of Bayesian neural networks	Riemannian metrics, invariance under parameter transformations	(Kristiadi et al., 2023 Roy et al., 5 Jun 2024)
Differential privacy in deep learning	Gradient space low-rank reparametrization	(Yu et al., 2021)
Physics-informed neural network solvers	Enforcing boundary conditions by analytical transformation	(Nand et al., 2023)
Optimization on graphs	GNN-based parameter reparametrization leveraging Hessians	(Dehmamy et al., 2022)
Physically-based rendering	Non-invertible, neural BRDF sampling by change-of-variable	(Wu et al., 13 May 2025)

6. Structural Invariance, Geometry, and Limitations

A recurring theme is the importance—and nontriviality—of ensuring invariance under reparametrization, particularly in Bayesian inference. Failure to account for weight-space symmetries or null directions can lead to posteriors that are miscalibrated or sensitive to non-identifiable parameterizations (Roy et al., 5 Jun 2024). Geometric approaches that employ the pullback metric, quotient spaces, or explicit Riemannian metrics yield methods whose predictions and uncertainty quantification are independent of such redundancies.

However, enforcing invariance can introduce additional computational and modeling challenges. For example, in the context of infinite-width neural networks, differing parameterizations (e.g., standard param vs. NTK param) correspond to genuinely different functions and priors, so the concept of "reparametrization" must be carefully circumscribed (Kristiadi et al., 2023). For many practical algorithms, the choice of reparametrization—e.g., what latent measure to push forward, what structure to enforce by design, what basis for low-rank projections—implies trade-offs between expressiveness, identifiability, and efficiency.

7. Summary and Outlook

Reparametrization-based methods constitute a foundational toolkit in modern statistical and machine learning methodology. Through the judicious introduction of new variables, mappings, or transformed architectures, these methods:

Simplify inference and optimization by reducing high- or infinite-dimensional nuisance structure to manageable finite parametrizations.
Enable unbiased, low-variance, and efficient gradient estimation, even for challenging distributional families and compositional latent variable models.
Ensure invariance of statistical learning outcomes under changes of parameterization by leveraging the geometry of the underlying function space.
Achieve superior empirical performance in models where computational, memory, or privacy constraints make conventional approaches infeasible.

Continued development and analysis of reparametrization-based methods—especially with a focus on invariance, structure encoding, and computational efficiency—will remain central as models grow in complexity and as the need for robust, generalizable, and interpretable inference sharpens across domains.