BayesLineFit: Max-Likelihood Orthogonal Regression
- BayesLineFit is a rigorous method for maximum-likelihood orthogonal regression that achieves unbiased parameter estimation by modeling measurement errors in both axes and accounting for latent scatter.
- It employs advanced normalizing flows and variational empirical Bayes to flexibly estimate intrinsic distributions and error correlations, enhancing robustness in under- and overdetermined cases.
- The algorithm leverages GPU-accelerated gradient-based optimization with iterative quadrature, delivering superior performance for large noisy datasets and low SNR scenarios.
Maximum-likelihood Orthogonal Regression (BayesLineFit) is a rigorous statistical technique for line fitting where measurement errors occur in both dependent and independent variables. It achieves unbiased parameter estimation by maximizing the likelihood under a joint model that accounts for measurement uncertainties, intrinsic population distributions, and potential latent scatter orthogonal to the regression line. The framework generalizes classic orthogonal regression through maximum-likelihood estimation (MLE), and introduces computational advances via normalizing flows for empirical Bayes modeling of latent variable distributions, robust gradient-based optimization, and efficient iterative solutions for both overdetermined and underdetermined cases (Jing et al., 13 Nov 2024, Guo et al., 15 Jul 2025).
1. Model Formulation and Likelihood Structure
Maximum-likelihood orthogonal regression posits that each observed data point arises from latent true values subject to Gaussian measurement noise
A regression law with orthogonal scatter is imposed: where is the slope, the intercept, and intrinsic orthogonal scatter. To account for the unobservable intrinsic distributions and correlations between measurement errors and values, population priors and are introduced, each parametrized by normalizing flows.
The joint likelihood for point is: Under independence, the total likelihood is , to be maximized with respect to parameters and population parameters (Jing et al., 13 Nov 2024).
2. Variational Empirical Bayes and Latent Population Modeling
Given the latent nature of population distributions and their analogues for , the methodology implements normalizing flows (NFs) for flexible, nonparametric density estimation. The flows establish a bijection from to a latent space with a Gaussian base density, translating to: Parameter estimation for NF components and proceeds by marginal likelihood maximization over observed and using variational inference (e.g., EM) and 1D Gauss–Hermite quadrature. The pzflow library employing RollingSplineCoupling layers—two layers, 16 bins, and a log-rescaling input transformation—serves as the reference implementation (Jing et al., 13 Nov 2024).
This empirical Bayes procedure allows the regression model to accommodate arbitrary intrinsic distributions and error-value correlations, enhancing robustness beyond classical parametric assumptions.
3. Numerical Optimization Workflow
Optimization follows a staged procedure:
- Preprocessing: Compute medians, rescale and optionally shrink values for initialization.
- NF Training: Pretrain flows for 200 epochs on rescaled data, then maximize marginal-likelihood using Adam with a learning rate of , full batch training, and early stopping (patience 10 epochs).
- MLE Fit of : With flows fixed, initialize from alternative estimators (ODR, GMM). For each step, numerically compute using nested Gauss–Hermite quadrature, sum log-likelihood, backpropagate gradients, and update via Adam until convergence (parameter change threshold ).
- Posterior Sampling (optional): Importance sampling on GPU, drawing from priors and re-weighting by likelihood.
All major computations (NF transforms, quadrature, gradient steps) are GPU-accelerated (Jing et al., 13 Nov 2024). This enables application to large datasets with high computational efficiency.
4. Generalized RV-ML Framework and Convexity Theory
Maximum-likelihood orthogonal regression is a specialization of the more general errors-in-variables problem with Gaussian model uncertainty. The underlying model is , where is a nominal matrix, is Gaussian noise independent across entries, and is output noise. Marginalizing over and leads to the effective noise covariance
yielding a log-likelihood function for : Convex reformulation (via SVD and lifting) facilitates the development of the generalized RV-ML (GRV-ML) algorithm, which handles both underdetermined and overdetermined systems with strong duality, improved complexity, and guaranteed global optima (Guo et al., 15 Jul 2025).
5. Closed-form Characterization and Solution Regimes
By examining the KKT conditions of the lifted convex program, all solutions classify into four distinct regimes (A–D) depending on system rank and a scalar derived from the data, measurement matrix, and noise parameters. Each regime has explicit formulas for solution existence, uniqueness, and multiplicity.
- Case A: , — infinitely many ML solutions.
- Case B/D: or , — unique solution via bisection for scalar .
- Case C: , — unique closed-form solution.
See the full specification in (Guo et al., 15 Jul 2025) for precise formulas and interval choices. Bisection typically converges within steps.
6. Specialization to 2D Orthogonal Regression: BayesLineFit Algorithm
For the 2D orthogonal errors-in-variables line fit, the GRV-ML formalism reduces to parameter estimation for line slope and intercept based on a noisy set : The scalar parameter is solved by bisection to satisfy , where is derived from KKT constraints. Iteration proceeds as: select , compute , evaluate , and update , converging in typically $10$–$20$ steps. Per-iteration complexity is (Guo et al., 15 Jul 2025). This method is referred to as BayesLineFit.
7. Statistical Properties, Performance, and Implementation
The method achieves unbiased, self-consistent orthogonal regression that,
- Accounts for noise in both axes,
- Models arbitrary intrinsic distributions via normalizing flows,
- Empirically infers latent error correlations and ,
- Leverages GPU acceleration for scalability.
Performance benchmarks (Jing et al., 13 Nov 2024) indicate that with sample sizes , both ML-based and KS-test-based methods significantly outperform classical regression in low SNR scenarios, while for $300 < N < 1000$, ML-based approaches provide best results. For , performance is comparable to existing state-of-the-art. The Python implementation "raddest" utilizing JAX and pzflow will be released. The Cramér–Rao bound for model uncertainty is derived in closed form and reveals that input matrix randomness can be beneficial, especially in underdetermined systems (Guo et al., 15 Jul 2025).
This suggests BayesLineFit is especially suited for modern astronomical datasets with complex distributions and strongly heterogeneous measurement errors, and may generalize to broader inverse problems with errors-in-variables and latent structure.