Likelihood-Free Density-Ratio Estimation

Updated 19 March 2026

Likelihood-Free Density-Ratio Estimation is a method that directly estimates the ratio p(x)/q(x) from samples without evaluating individual densities.
It leverages techniques such as divergence minimization, projection pursuit, spectral series, and flow/score-based approaches to model complex high-dimensional data.
The approach underpins applications in covariate shift adaptation, causal inference, and simulation-based inference while offering computational efficiency and statistical guarantees.

A likelihood-free density-ratio estimator is a statistical and machine learning tool for estimating the ratio of two unknown probability density functions—often $r(x) = p(x) / q(x)$ —using only i.i.d. samples from $p$ and $q$ , without explicit knowledge or estimation of either $p$ or $q$ . This paradigm underpins a broad range of methods for covariate shift adaptation, causal inference, model criticism, independent testing, mutual information estimation, domain adaptation, and likelihood-free inference in simulation-based models. Likelihood-free estimators avoid explicit likelihood evaluation, instead leveraging either variational principles, divergence minimization, or discriminative learning, often in conjunction with sample-based approximation and function approximation schemes such as neural networks, kernel expansions, or projection-pursuit bases.

1. Foundations of Likelihood-Free Density-Ratio Estimation

In the classical DRE setting, the objective is, given samples $\{x_i^p\}_{i=1}^{n_p}$ from $p(x)$ and $\{x_j^q\}_{j=1}^{n_q}$ from $q(x)$ , to estimate $r^*(x) = p(x)/q(x)$ . All modern likelihood-free DRE methods circumvent the intractability of marginal or conditional densities by directly modeling $r^*$ through sample-based loss functions. The standard approach is to cast DRE as the minimization of a strict divergence functional

$D(p \parallel q \cdot r) = \int p(x)\,\phi(r(x))\,dx + \int q(x)\,\psi(r(x))\,dx$

for suitable convex $\phi, \psi$ , sidestepping the need to ever fit $p$ or $q$ directly. Common loss choices include the $L^2$ distance $L_2(r) = E_q[(r^*(x)-r(x))^2]$ and the unnormalized Kullback–Leibler divergence $UKL(r) = E_q[r(x)] - E_p[\log r(x)]$ —both depending only on samples and a functional model for $r(x; \theta)$ (Wang et al., 1 Jun 2025).

This likelihood-free property generalizes to simulator-based inference, mutual information estimation, and sequential Monte Carlo, with further task-specific modifications in loss form, parametrization, and sample structuring (Thomas et al., 2016).

2. Classes of Likelihood-Free Density-Ratio Estimators

2.1 Projection Pursuit Density-Ratio Estimation (ppDRE)

The projection pursuit estimator (Wang et al., 1 Jun 2025) decomposes the log-density ratio into a product of $K$ one-dimensional functions over linear projections: $r_K(x) = \prod_{k=1}^K f_k(a_k^\top x)$ where each $a_k$ is a unit-norm vector and $f_k$ is a univariate function, parameterized via a linear sieve basis (e.g., Hermite polynomials, Gaussian atoms). The estimation procedure adopts an iterative, stage-wise optimization, where each partial ratio $r_{k-1}(x)$ is augmented by solving

$\min_{f, a} E_q[(r^*(x) - r_{k-1}(x) \cdot f(a^\top x))^2]$

yielding computational and statistical efficiency in high dimensions (scaling up to $d \gg 100$ ), fast convergence rates under mild smoothness, and low sample complexity per projection direction. Empirically, ppDRE consistently surpasses conventional methods (uLSIF, KLIEP) above $d \sim 10$ (Wang et al., 1 Jun 2025).

2.2 Spectral Series Expansion

High-dimensional density-ratio estimation can also be recast as series expansion in the eigenbasis of a kernel integral operator $T_q$ on $L^2(\mathcal{X}, q)$ (Izbicki et al., 2014): $r(x) \approx \sum_{j=1}^J \beta_j \psi_j(x)$ Eigenfunctions $\psi_j$ and coefficients $\beta_j$ are estimated using the Nyström extension and empirical averages over samples from $q$ and $p$ , respectively. Model selection is accomplished by cross-validation under the $L^2(q)$ risk, and the approach extends naturally to intractable likelihood estimation through tensor product expansions, yielding strong empirical risk guarantees and scalability in data geometry (Izbicki et al., 2014).

2.3 Flow-Based and Score-Based Approaches

Modern techniques for intractable distributions employ continuous normalizing flows (CNFs) and score-based models. For example, in the scRatio formulation (Antipov et al., 27 Feb 2026), after fitting CNFs to each distribution, the log-density ratio is computed by integrating a specific ODE along a single generative path: $\log \frac{p_1^\theta(x)}{p_0^\theta(x)} = -\int_{t=1}^{t=0} \left[ \nabla \cdot (u_t^\theta(z_t|0) - u_t^\theta(z_t|1)) + (u_t^\theta(z_t|0) - u_t^\theta(z_t|1))^\top s_t^\psi(z_t|0) \right] dt$ This construction eliminates the numerical and computational instability of separately estimating each density, halving inference time and directly yielding the log-ratio for applications including genomics differential analysis, batch effect removal, and combinatorial condition comparison.

Score-based approaches (e.g., DRE-∞ (Choi et al., 2021) and D^3RE (Chen et al., 8 May 2025)) interpolate between $p_0$ and $p_1$ by bridging distributions $p_t(x)$ (via deterministic, stochastic, or optimal-transport paths) and learning the time derivative $s_t(x) = \partial_t \log p_t(x)$ . Integrating the learned $s_\theta(x, t)$ reconstructs the log-density ratio, with guaranteed stability via bridge dequantization and bounded time scores. D^3RE further incorporates optimum transport reconciliation (Schrödinger bridge) for minimal error and reduced function evaluations (Chen et al., 8 May 2025).

2.4 Classification and f-Divergence–Based Methods

Many estimators, such as LFIRE (Thomas et al., 2016), classifier-based InfoNCE/Fenchel contrastive learning (Durkan et al., 2020, Papamakarios, 2019), and neural DRE (Moustakides et al., 2019), cast density-ratio estimation as a discriminative problem. A classifier distinguishes "joint" samples $(x, \theta) \sim p(x, \theta)$ from "product"/"reference" samples; the optimal classification rule, trained by cross-entropy, directly provides the likelihood ratio: $D^*(x, \theta) = \frac{p(x, \theta)}{p(x, \theta) + p(x) p(\theta)} \implies r(x; \theta) = \frac{D^*(x, \theta)}{1 - D^*(x, \theta)}$ This framework unifies neural conditional density estimation (SNPE), contrastive losses, and regularized logistic regression, with extensions to high-dimensional summary selection, mutual information estimation, and amortized simulation-based inference.

2.5 RKHS and Regularized Bregman Losses

Kernel-based approaches model $r$ as an RKHS function, minimizing regularized empirical Bregman divergence

$J(f) = D_\varphi(r \| f) + \lambda \|f\|_\mathcal{H}^2$

where $\lambda$ is selected adaptively by Lepskii's rule to minimize finite-sample error without requiring regularity knowledge (Zellinger et al., 2023). Closed-form solutions are available for the optimal $f^*$ via the representer theorem and linear system solvers.

2.6 Direct Estimation in Exponential Families (KLIEP)

The KLIEP estimator models $r(x)$ as an exponential family and minimizes the empirical loss

$L(\theta) = -\bar{T}^x \cdot \theta + \log \left( \frac{1}{m} \sum_{j=1}^m e^{\theta^\top T(Y_j)} \right )$

Regularization is essential for existence and stability in high dimensions, with feasibility depending on whether the mean sufficient statistic falls within the convex hull of the reference sufficient statistics (Banzato et al., 18 Feb 2025).

3. Algorithmic Implementation and Practical Issues

Most likelihood-free estimators share the following workflow:

Sampling: Obtain i.i.d. samples from the target $p$ and reference $q$ (possibly with dequantization or bridge construction).
Modeling: Parameterize $r(x)$ using a neural network, basis expansion, or kernel method.
Loss Function: Choose a divergence, moment-matching, or classification-based loss.
Optimization: Use gradient descent, alternating minimization, or convex optimization, depending on the method.
Model Selection and Calibration: Use cross-validation, regularization path, or parameter selection principles (e.g., Lepskii rule, log-sum-exp stabilization).

Empirical results demonstrate that nonparametric and projection-pursuit methods achieve superior accuracy and scalability in $d \geq 10$ (Wang et al., 1 Jun 2025), while flow-based and time-score approaches remain robust even in $d = 100$ –$320$ (Chen et al., 8 May 2025, Antipov et al., 27 Feb 2026, Choi et al., 2021).

4. Theoretical Guarantees

Many estimators achieve statistical consistency and nonparametric minimax optimal rates. For example, under sieve-regression conditions, ppDRE achieves

$\sup_x|r̂_K(x) - r_K(x)| = \sum_{ℓ=1}^K O_p ( J_ℓ^{-(s-1)} + \sqrt{J_ℓ/n} )$

Regularized RKHS methods provide adaptive minimax bounds

$\|f̂ - r\|_{L^2(q)} \leq C n^{-(2s\alpha+\alpha)/(2s\alpha+\alpha+1)}$

and series expansions enjoy analogous L²-risk guarantees. Score-based and flow-based schemes offer approximation guarantees contingent on smoothness and regularity of bridge paths and score networks (Wang et al., 1 Jun 2025, Izbicki et al., 2014, Zellinger et al., 2023, Chen et al., 8 May 2025).

Classifier-based and contrastive estimators are consistent as the number of samples and model capacity increase, directly recovering the log-likelihood ratio in the infinite-data limit (Papamakarios, 2019, Durkan et al., 2020).

5. Applications and Empirical Comparisons

Likelihood-free density-ratio estimation has been successfully applied in:

Covariate shift adaptation: Reweighting loss functions for robust supervised learning (Wang et al., 1 Jun 2025, Izbicki et al., 2014).
High-dimensional mutual information estimation: Outperforming single-ratio methods in up to $d=320$ (Rhodes et al., 2020, Choi et al., 2021, Chen et al., 8 May 2025).
Simulation-based inference: Posterior and likelihood inference from simulator-based models (Thomas et al., 2016, Papamakarios, 2019, Durkan et al., 2020).
Batch correction and conditional contrast in genomics: Flow-based ratios for protein and single-cell data (Antipov et al., 27 Feb 2026).
Causal inference and dose-response estimation: Accurate stabilized weight and dose-response curve estimation (Wang et al., 1 Jun 2025).

Empirical results consistently demonstrate superior estimation error, sample efficiency, and stability for projection-pursuit, spectral series, flow-based, score-based, and telescoping estimators when compared to traditional methods such as uLSIF, KLIEP, and noise-contrastive estimation (Wang et al., 1 Jun 2025, Izbicki et al., 2014, Rhodes et al., 2020).

6. Limitations and Open Challenges

While likelihood-free DRE solutions are powerful, they face some limitations:

Curse of Dimensionality: Despite improvements, extremely high-dimensional data may require careful architecture or feature representations.
Bridge/path construction: The design and stability of interpolating paths (both deterministic and stochastic) are critical for accurate time-score-based estimation; stability and support coverage are addressed via methods such as dequantified diffusion bridges (Chen et al., 8 May 2025).
Hyperparameter Sensitivity: Choice of regularization, basis size, bridge parameters, and path discretization may require tuning.
Existence and Well-posedness: For parametric exponential family estimators, precise feasibility conditions and necessary regularization constraints must be checked a priori (Banzato et al., 18 Feb 2025).
Computational Complexity: High computational cost in kernel eigendecomposition, Sinkhorn iterations, and ODE solvers can arise but can be amortized or approximated via modern numerical methods.

Ongoing directions include theoretical sample-complexity bounds for multi-bridge methods, learned adaptive path construction, algorithmic acceleration for kernel and sinkhorn steps, and extension of DRE theory to more general divergence-based and conditional frameworks (Rhodes et al., 2020, Chen et al., 8 May 2025, Choi et al., 2021).

7. Summary Table of Representative Likelihood-Free DRE Methods

Method / Reference	Parametric Model	Loss Principle	Scalability & Domain
ppDRE (Wang et al., 1 Jun 2025)	Product-of-1D sieves	$L^2$ , $UKL$	$d\sim$ 100+, covariate shift, MI
Spectral series (Izbicki et al., 2014)	Kernel eigenbasis	$L^2$	High $d$ , likelihood-free inference
scRatio (Antipov et al., 27 Feb 2026)	Conditional flows	ODE log-ratio	Genomics, $d$ up to 320, efficiency
D^3RE (Chen et al., 8 May 2025)	Score network	Time-score matching	Uniform error, fast convergence
Telescoping DRE (Rhodes et al., 2020)	Chained classifiers	Logistic / NCE	Large KL gap, MI estimation
LFIRE (Thomas et al., 2016)	Regularized logistic	Contrastive	Posterior with summary selection
RKHS Bregman (Zellinger et al., 2023)	Kernel regression	Quadratic/KL loss	Adaptive rate, two-sample testing
KLIEP (Banzato et al., 18 Feb 2025)	Exp-family ratio	Convex, regularized	High- $d$ , convex-hull check

All methods are fully likelihood-free: no density $p$ or $q$ is explicitly evaluated; only samples, sample averages, and model outputs via chosen basis, network, or kernel structure.

References:

"Projection Pursuit Density Ratio Estimation" (Wang et al., 1 Jun 2025)
"High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation" (Izbicki et al., 2014)
"Flow-Based Density Ratio Estimation for Intractable Distributions with Applications in Genomics" (Antipov et al., 27 Feb 2026)
"Dequantified Diffusion-Schrödinger Bridge for Density Ratio Estimation" (Chen et al., 8 May 2025)
"Telescoping Density-Ratio Estimation" (Rhodes et al., 2020)
"Density Ratio Estimation via Infinitesimal Classification" (Choi et al., 2021)
"Likelihood-free inference by ratio estimation" (Thomas et al., 2016)
"Adaptive learning of density ratios in RKHS" (Zellinger et al., 2023)
"Existence of Direct Density Ratio Estimators" (Banzato et al., 18 Feb 2025)
"Training Neural Networks for Likelihood/Density Ratio Estimation" (Moustakides et al., 2019)
"Neural Density Estimation and Likelihood-free Inference" (Papamakarios, 2019)
"On Contrastive Learning for Likelihood-free Inference" (Durkan et al., 2020)