CosmicFlows-3 Multi-Attractor Flow Model

Updated 30 December 2025

The paper introduces a maximum-likelihood orthogonal regression framework using empirical Bayes and variational inference to yield unbiased estimates in the presence of heteroscedastic noise.
It details a convex-analytic solution with closed-form iterations for 2D orthogonal line fitting that ensures global optimality and computational efficiency.
The application to CosmicFlows-3 leverages GPU acceleration and normalizing flows to robustly model complex astronomical datasets with varying measurement uncertainties.

Maximum-likelihood Orthogonal Regression (BayesLineFit) refers to a family of statistically rigorous approaches for fitting linear relationships to data with measurement uncertainty in both variables, under the maximum-likelihood principle. The most recent developments synthesize the classical errors-in-variables perspective with modern Bayesian nonparametric modeling, notably using normalizing flows. The “BayesLineFit” framework encompasses both (1) a general maximum-likelihood orthogonal regression model leveraging empirical Bayes via variational inference and (2) a convex-analytic solution to the Gaussian linear errors-in-variables problem, with efficient algorithms and closed-form reduction for the 2-D orthogonal line case. These methods yield unbiased orthogonal regression estimates, explicitly accounting for heteroscedastic measurement noise, nontrivial population distributions, and latent variable dependencies.

1. Model Formulation and Likelihood Principle

Maximum-likelihood orthogonal regression assumes each observed point

$(x_{\mathrm{obs},i}, \sigma_{x,i}, y_{\mathrm{obs},i}, \sigma_{y,i})$

is a noisy observation of a latent point $(x_i, y_i)$ . Noise is modeled as

$x_{\mathrm{obs},i} \sim \mathcal{N}(x_i, \sigma_{x,i}^2), \quad y_{\mathrm{obs},i} \sim \mathcal{N}(y_i, \sigma_{y,i}^2)$

In the simple 1D orthogonal line case, the regression law is: $d_i = \frac{y_i - k x_i - b}{\sqrt{1 + k^2}} \sim \mathcal{N}(0, \sigma_{\mathrm{int}}^2)$ where $k$ is the slope, $b$ is the intercept, and $\sigma_{\mathrm{int}}$ is the intrinsic orthogonal scatter. The joint likelihood for a single datum is: $P_i \equiv P(D_i|\theta,\phi) = \iint dx_i dy_i\; \mathcal{N}(x_{\mathrm{obs},i}|x_i,\sigma_{x,i}^2) \mathcal{N}(y_{\mathrm{obs},i}|y_i,\sigma_{y,i}^2) \left[ \sqrt{1+k^2}\,\mathcal{N}\left(\frac{y_i-kx_i-b}{\sqrt{1+k^2}}|0,\sigma_{\mathrm{int}}^2\right)\right] P_{(\phi_x)}(x_i,\sigma_{x,i}) P_{(\phi_y)}(y_i,\sigma_{y,i})$ where $P_{(\phi_x)}$ and $P_{(\phi_y)}$ characterize the "population" distributions, modeled nonparametrically, for the independent and dependent variables and their uncertainties. The global likelihood is $L(\theta,\phi) = \prod_{i=1}^N P_i$ , where $\theta = (k, b, \sigma_{\mathrm{int}})$ are regression parameters and $\phi$ are nuisance (population) parameters (Jing et al., 13 Nov 2024).

2. Empirical Bayes, Variational Inference, and Normalizing Flows

Because the true population distributions $P(x)$ and $P(\sigma_x|x)$ (and analogs for $y$ ) are typically unknown in practice, an empirical Bayes methodology is employed. Both are modeled with normalizing flows — invertible bijective maps $f_{\phi}: (x,\sigma_x) \to z \in \mathbb{R}^2$ such that $z$ has a tractable base distribution, usually a standard Gaussian: $P_{\phi}(x,\sigma_x) = P_z(f_{\phi}(x,\sigma_x)) \cdot \left| \det \frac{\partial f_{\phi}}{\partial(x, \sigma_x)} \right|$ Empirical Bayes estimation fits these flows to maximize the marginal likelihood of observed $(x_{\text{obs},i}, \sigma_{x,i})$ data, which involves integrating over the latent true $x$ using Gauss–Hermite quadrature in 1D. An analogous process applies independently to $y_i$ (Jing et al., 13 Nov 2024).

After learning $\phi_x$ and $\phi_y$ , regression parameters $\theta$ are optimized by maximizing the full composite likelihood, leveraging automatic differentiation. The entire process is GPU-accelerated, and normalizing-flows are implemented using the pzflow library with RollingSplineCoupling layers (Jing et al., 13 Nov 2024).

3. Convex-analytic and Algorithmic Solution: GRV-ML and BayesLineFit

The "GRV-ML" (generalized RV-ML) algorithm solves the Gaussian errors-in-variables MLE for general linear regression, admitting both overdetermined and underdetermined cases (Guo et al., 15 Jul 2025). Given observed $y \in \mathbb{R}^M$ ,

$y = (H + E)x + \epsilon$

where $H$ is a known matrix, $E$ is random Gaussian i.i.d. matrix uncertainty, and $\epsilon$ is observation noise. Marginalizing over $E$ yields

$y \sim \mathcal{N}(Hx,~(\sigma_e^2 \|x\|_2^2 + \sigma_\epsilon^2) I_M)$

The negative log-likelihood is

$f(y; x) = \frac{\|y - Hx\|_2^2}{2(\sigma_e^2 \|x\|_2^2 + \sigma_\epsilon^2)} + \frac{M}{2} \ln(\sigma_e^2 \|x\|_2^2 + \sigma_\epsilon^2)$

which is generally nonconvex in $x$ , but "lifting" to suitable slack variables yields a convex program. This admits strong duality, explicit KKT conditions, and a bisection/root-finding algorithm for the global solution. In the case of 2D orthogonal regression (i.e., classical line fitting with noisy $x_i$ and $y_i$ ), the solution reduces to a closed-form iteration: $m(\nu) = \frac{\sum_i x_i y_i/[1 + 2\nu\sigma_e^2 x_i^2]}{\sum_i x_i^2/[1 + 2\nu\sigma_e^2 x_i^2]}$

$c(\nu) = \frac{\sum_i (y_i - m(\nu)x_i)/[1 + 2\nu\sigma_e^2 x_i^2]}{\sum_i 1/[1 + 2\nu\sigma_e^2 x_i^2]}$

where the Lagrange multiplier $\nu$ is solved via scalar bisection. This specific formulation is conventionally called BayesLineFit (Guo et al., 15 Jul 2025).

4. Implementation Details and Computational Scheme

Key implementation practices include:

Pretraining normalizing flows on "shrunk" data for stability.
Training flows via full-batch Adam optimizer (learning rate $3 \times 10^{-3}$ , early stopping with patience 10 epochs).
Marginal likelihood evaluations via 1D Gauss–Hermite quadrature (approx. 40 nodes per integral).
For $\theta$ optimization, automatically differentiating the log-likelihood with respect to $(k, b, \sigma_{\text{int}})$ .
Use of GPU acceleration for both flow training, quadrature, and posterior-sampling.
For the analytic GRV-ML form, the main computational steps are SVD factorization, sign-invariant transformations, and, in the 2D case, $O(M)$ per-iteration cost for $\nu$ -bisection. Convergence is typically achieved within 10–20 iterations (Jing et al., 13 Nov 2024, Guo et al., 15 Jul 2025).

5. Performance Characteristics and Regimes of Validity

BayesLineFit exhibits unbiasedness and self-consistency—correctly accounting for heteroscedastic errors in both axes, arbitrary intrinsic distributions, and latent error correlations. Tests show that with sample sizes exceeding 1000, both ML-based and KS-test-based methods substantially outperform previous approaches, especially under low signal-to-noise. For moderate ($300 < N < 1000$) samples, the ML-based method yields the best performance. In the low-sample regime ( $N < 300$ ), competitive results are maintained versus state-of-the-art alternatives (Jing et al., 13 Nov 2024).

The analytic GRV-ML admits a complete KKT characterization distinguishing four regimes, depends on the rank of $H$ and sign of certain feasibility scalars, and, in underdetermined cases, allows for non-uniqueness or benefits from the randomness of $H$ for additional norm-information on $x$ (Guo et al., 15 Jul 2025). The Cramér–Rao bound under the marginalized model $y \sim \mathcal{N}(Hx, (\sigma_e^2 \|x\|^2 + \sigma_\epsilon^2)I)$ is explicit, and in the limit $\sigma_e \to 0$ reduces to the classic least-squares CRB (Guo et al., 15 Jul 2025).

6. Applications, Extensions, and Comparative Context

BayesLineFit is specifically developed for regression on astronomical data with realistic, non-Gaussian distributions and varying uncertainties. Its performance on both mock and real datasets from PHANGS-ALMA and PHANGS-JWST demonstrates reliability under challenging, signal-poor and structurally complex regimes (Jing et al., 13 Nov 2024). The normalizing flow approach allows the method to adapt to arbitrary intrinsic population and error structures beyond any fixed parametric form.

The GPU-based Python code "raddest" implementing these methods is slated for public release, ensuring reproducibility and extensibility. In contrast to expectation-maximization approaches and previous RV-ML/ODR solvers, the convexity and lifting arguments underlying GRV-ML/BayesLineFit yield global guarantees and superior algorithmic efficiency, as well as robust handling of both underdetermined and overdetermined problems (Guo et al., 15 Jul 2025).

7. Relation to Broader Regression Paradigms

Maximum-likelihood orthogonal regression via BayesLineFit provides a unified, statistically principled solution to the errors-in-variables linear fitting problem, augmenting classical total least squares, ODR, and Bayesian hierarchical regression by:

Explicitly integrating both measurement and intrinsic scatter
Learning and modeling population structure nonparametrically via normalizing flows
Accommodating arbitrary error correlations and non-Gaussian data
Delivering provably convex optimization and analytic KKT-based solution structure in the linear Gaussian case (Jing et al., 13 Nov 2024, Guo et al., 15 Jul 2025)

A plausible implication is that this framework generalizes to multivariate and nonlinear settings, provided the computational cost of the empirical Bayes and normalizing flow components remains tractable. The method thus advances the state-of-the-art in orthogonal regression for complex, noise-dominated scientific data sets.