Papers
Topics
Authors
Recent
2000 character limit reached

BayesLineFit: Max-Likelihood Orthogonal Regression

Updated 30 December 2025
  • BayesLineFit is a rigorous method for maximum-likelihood orthogonal regression that achieves unbiased parameter estimation by modeling measurement errors in both axes and accounting for latent scatter.
  • It employs advanced normalizing flows and variational empirical Bayes to flexibly estimate intrinsic distributions and error correlations, enhancing robustness in under- and overdetermined cases.
  • The algorithm leverages GPU-accelerated gradient-based optimization with iterative quadrature, delivering superior performance for large noisy datasets and low SNR scenarios.

Maximum-likelihood Orthogonal Regression (BayesLineFit) is a rigorous statistical technique for line fitting where measurement errors occur in both dependent and independent variables. It achieves unbiased parameter estimation by maximizing the likelihood under a joint model that accounts for measurement uncertainties, intrinsic population distributions, and potential latent scatter orthogonal to the regression line. The framework generalizes classic orthogonal regression through maximum-likelihood estimation (MLE), and introduces computational advances via normalizing flows for empirical Bayes modeling of latent variable distributions, robust gradient-based optimization, and efficient iterative solutions for both overdetermined and underdetermined cases (Jing et al., 13 Nov 2024, Guo et al., 15 Jul 2025).

1. Model Formulation and Likelihood Structure

Maximum-likelihood orthogonal regression posits that each observed data point (xobs,i,σx,i,yobs,i,σy,i)\left(x_{\rm obs,i}, \sigma_{x,i}, y_{\rm obs,i}, \sigma_{y,i}\right) arises from latent true values (xi,yi)(x_i, y_i) subject to Gaussian measurement noise

xobs,iN(xi,σx,i2),yobs,iN(yi,σy,i2).x_{{\rm obs},i} \sim \mathcal{N}(x_i, \sigma_{x,i}^2), \qquad y_{{\rm obs},i} \sim \mathcal{N}(y_i, \sigma_{y,i}^2).

A regression law with orthogonal scatter is imposed: di=yikxib1+k2,diN(0,σint2),d_i = \frac{y_i - k x_i - b}{\sqrt{1 + k^2}}, \qquad d_i \sim \mathcal{N}(0, \sigma_{\rm int}^2), where kk is the slope, bb the intercept, and σint\sigma_{\rm int} intrinsic orthogonal scatter. To account for the unobservable intrinsic distributions and correlations between measurement errors and values, population priors P(xi,σx,iϕx)P(x_i, \sigma_{x,i} \mid \phi_x) and P(yi,σy,iϕy)P(y_i, \sigma_{y,i} \mid \phi_y) are introduced, each parametrized by normalizing flows.

The joint likelihood for point ii is: Pi=dxidyi  N(xobs,ixi,σx,i2)N(yobs,iyi,σy,i2)[1+k2N(yikxib1+k2    0,σint2)]P(σx,ixi;ϕx,1)P(xi;ϕx,2)P(σy,iyi;ϕy,1)P(yi;ϕy,2).P_i = \int dx_i \, dy_i \; \mathcal{N}(x_{\rm obs,i} \mid x_i,\sigma_{x,i}^2) \mathcal{N}(y_{\rm obs,i} \mid y_i, \sigma_{y,i}^2) \left[\sqrt{1 + k^2} \mathcal{N} \left( \frac{y_i - k x_i - b}{\sqrt{1 + k^2}} \; | \; 0, \sigma_{\rm int}^2 \right) \right] P(\sigma_{x,i} | x_i; \phi_{x,1}) P(x_i; \phi_{x,2}) P(\sigma_{y,i}|y_i;\phi_{y,1}) P(y_i;\phi_{y,2}). Under independence, the total likelihood is L(θ,ϕ)=i=1NPiL(\theta, \phi) = \prod_{i=1}^N P_i, to be maximized with respect to parameters θ=(k,b,σint)\theta = (k, b, \sigma_{\rm int}) and population parameters ϕ\phi (Jing et al., 13 Nov 2024).

2. Variational Empirical Bayes and Latent Population Modeling

Given the latent nature of population distributions P(x),P(σxx)P(x), P(\sigma_x | x) and their analogues for yy, the methodology implements normalizing flows (NFs) for flexible, nonparametric density estimation. The flows establish a bijection fϕf_\phi from (x,σx)(x, \sigma_x) to a latent space zz with a Gaussian base density, translating to: Pϕ(x,σx)=Pz(fϕ(x,σx))detfϕ/(x,σx).P_\phi(x, \sigma_x) = P_z(f_\phi(x, \sigma_x)) \cdot |\det \partial f_\phi / \partial (x, \sigma_x)|. Parameter estimation for NF components ϕx\phi_x and ϕy\phi_y proceeds by marginal likelihood maximization over observed (xobs,i,σx,i)(x_{\rm obs,i}, \sigma_{x,i}) and (yobs,i,σy,i)(y_{\rm obs,i}, \sigma_{y,i}) using variational inference (e.g., EM) and 1D Gauss–Hermite quadrature. The pzflow library employing RollingSplineCoupling layers—two layers, 16 bins, and a log-rescaling input transformation—serves as the reference implementation (Jing et al., 13 Nov 2024).

This empirical Bayes procedure allows the regression model to accommodate arbitrary intrinsic distributions and error-value correlations, enhancing robustness beyond classical parametric assumptions.

3. Numerical Optimization Workflow

Optimization follows a staged procedure:

  • Preprocessing: Compute medians, rescale and optionally shrink xx values for initialization.
  • NF Training: Pretrain flows for 200 epochs on rescaled data, then maximize marginal-likelihood using Adam with a learning rate of 3×1033\times 10^{-3}, full batch training, and early stopping (patience 10 epochs).
  • MLE Fit of θ\theta: With flows fixed, initialize (k,b,σint)(k,b,\sigma_{\rm int}) from alternative estimators (ODR, GMM). For each step, numerically compute Pi(θ)P_i(\theta) using nested Gauss–Hermite quadrature, sum log-likelihood, backpropagate gradients, and update via Adam until convergence (parameter change threshold <105<10^{-5}).
  • Posterior Sampling (optional): Importance sampling on GPU, drawing from priors and re-weighting by likelihood.

All major computations (NF transforms, quadrature, gradient steps) are GPU-accelerated (Jing et al., 13 Nov 2024). This enables application to large datasets (N>1000)(N>1000) with high computational efficiency.

4. Generalized RV-ML Framework and Convexity Theory

Maximum-likelihood orthogonal regression is a specialization of the more general errors-in-variables problem with Gaussian model uncertainty. The underlying model is y=(H+E)x+ϵy = (H + E)x + \epsilon, where HH is a nominal matrix, EE is Gaussian noise independent across entries, and ϵ\epsilon is output noise. Marginalizing over EE and ϵ\epsilon leads to the effective noise covariance

Cov[ϵeq]=σe2x22IM+σϵ2IM,\text{Cov}[\epsilon_{\rm eq}] = \sigma_e^2 \|x\|_2^2 \cdot I_M + \sigma_\epsilon^2 I_M,

yielding a log-likelihood function for xx: (x)=12(yHx)T[(σe2x22+σϵ2)I]1(yHx)12Mln(σe2x22+σϵ2).\ell(x) = -\frac{1}{2} (y-Hx)^T [( \sigma_e^2 \|x\|_2^2 + \sigma_\epsilon^2 )I]^{-1} (y-Hx) - \frac{1}{2} M \ln( \sigma_e^2 \|x\|_2^2 + \sigma_\epsilon^2 ). Convex reformulation (via SVD and lifting) facilitates the development of the generalized RV-ML (GRV-ML) algorithm, which handles both underdetermined and overdetermined systems with strong duality, improved complexity, and guaranteed global optima (Guo et al., 15 Jul 2025).

5. Closed-form Characterization and Solution Regimes

By examining the KKT conditions of the lifted convex program, all solutions classify into four distinct regimes (A–D) depending on system rank and a scalar SS derived from the data, measurement matrix, and noise parameters. Each regime has explicit formulas for solution existence, uniqueness, and multiplicity.

  • Case A: RN1R \leq N-1, S0S \geq 0 — infinitely many ML solutions.
  • Case B/D: RN1R \leq N-1 or R=NR=N, S0S \neq 0 — unique solution via bisection for scalar ν\nu^*.
  • Case C: R=NR=N, S=0S=0 — unique closed-form solution.

See the full specification in (Guo et al., 15 Jul 2025) for precise formulas and interval choices. Bisection typically converges within O(log(T/ϵ))O(\log(T/\epsilon)) steps.

6. Specialization to 2D Orthogonal Regression: BayesLineFit Algorithm

For the 2D orthogonal errors-in-variables line fit, the GRV-ML formalism reduces to parameter estimation for line slope mm and intercept cc based on a noisy set (xi,yi)(x_i, y_i): m(ν)=i=1Mxiyi/[1+2νσe2xi2]i=1Mxi2/[1+2νσe2xi2],c(ν)=i=1M(yim(ν)xi)/[1+2νσe2xi2]i=1M1/[1+2νσe2xi2].m(\nu) = \frac{\sum_{i=1}^M x_i y_i / [1 + 2 \nu \sigma_e^2 x_i^2]}{\sum_{i=1}^M x_i^2 / [1 + 2 \nu \sigma_e^2 x_i^2]}, \qquad c(\nu) = \frac{\sum_{i=1}^M (y_i - m(\nu) x_i) / [1 + 2 \nu \sigma_e^2 x_i^2]}{\sum_{i=1}^M 1 / [1 + 2 \nu \sigma_e^2 x_i^2]}. The scalar parameter ν\nu is solved by bisection to satisfy g(ν)=0g(\nu)=0, where g(ν)g(\nu) is derived from KKT constraints. Iteration proceeds as: select ν\nu, compute (m,c)(m, c), evaluate g(ν)g(\nu), and update ν\nu, converging in typically $10$–$20$ steps. Per-iteration complexity is O(M)O(M) (Guo et al., 15 Jul 2025). This method is referred to as BayesLineFit.

7. Statistical Properties, Performance, and Implementation

The method achieves unbiased, self-consistent orthogonal regression that,

  • Accounts for noise in both axes,
  • Models arbitrary intrinsic distributions via normalizing flows,
  • Empirically infers latent error correlations P(σx)P(\sigma|x) and P(σy)P(\sigma|y),
  • Leverages GPU acceleration for scalability.

Performance benchmarks (Jing et al., 13 Nov 2024) indicate that with sample sizes N>1000N > 1000, both ML-based and KS-test-based methods significantly outperform classical regression in low SNR scenarios, while for $300 < N < 1000$, ML-based approaches provide best results. For N<300N < 300, performance is comparable to existing state-of-the-art. The Python implementation "raddest" utilizing JAX and pzflow will be released. The Cramér–Rao bound for model uncertainty is derived in closed form and reveals that input matrix randomness can be beneficial, especially in underdetermined systems (Guo et al., 15 Jul 2025).

This suggests BayesLineFit is especially suited for modern astronomical datasets with complex distributions and strongly heterogeneous measurement errors, and may generalize to broader inverse problems with errors-in-variables and latent structure.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Maximum-likelihood Orthogonal Regression (BayesLineFit).