Relative uLSIF (RuLSIF)

Updated 16 June 2026

Relative uLSIF (RuLSIF) is a non-parametric method that estimates α-relative density ratios using an α-mixture, ensuring bounded estimates and enhanced numerical stability.
It employs kernel expansion and squared-loss minimization with regularization, resulting in an analytic closed-form solution that avoids iterative quadratic programming.
RuLSIF is effective in change-point detection for time series, demonstrating improved convergence rates, robustness, and computational efficiency over methods like uLSIF and KLIEP.

Relative uLSIF (RuLSIF) is a non-parametric statistical method for direct estimation of α-relative density ratios and their associated divergences between two probability distributions. RuLSIF generalizes the unconstrained Least-Squares Importance Fitting (uLSIF) method by introducing an α-mixture in the denominator of the density ratio, conferring boundedness and improved numerical stability. It is particularly effective in applications such as change-point detection in time-series settings, where robust, analytic, and sample-efficient divergence estimation is required (Liu et al., 2012).

1. Formal Definition of α-Relative Density Ratio and Relative Pearson Divergence

Let $p(x)$ and $q(x)$ be two probability densities on $\mathbb{R}^d$ . For a prescribed parameter $0 \leq \alpha < 1$ , RuLSIF introduces the α-mixture density

$p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$

and defines the α-relative density ratio as

$r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$

The associated α-relative Pearson (PE) divergence is then given as:

$D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$

quantifying the discrepancy between $p(x)$ and $q(x)$ through their α-relative density ratio. For $\alpha=0$ , this reduces to the conventional density-ratio formulation used in uLSIF.

2. Direct Density-Ratio Estimation via Squared-Loss with Regularization

RuLSIF models the unknown $q(x)$ 0 by a kernel expansion:

$q(x)$ 1

where $q(x)$ 2 are basis points—often sampled from $q(x)$ 3 or both $q(x)$ 4 and $q(x)$ 5—and $q(x)$ 6 is a positive-definite kernel, typically Gaussian. The method determines $q(x)$ 7 by minimizing the regularized expected squared error:

$q(x)$ 8

with regularization parameter $q(x)$ 9. Expanding this, the minimization can be expressed as:

$\mathbb{R}^d$ 0

where

$\mathbb{R}^d$ 1

Empirical estimation replaces expectations with averages over samples $\mathbb{R}^d$ 2 and $\mathbb{R}^d$ 3:

$\mathbb{R}^d$ 4

3. Analytic Solution and Computational Considerations

Setting $\mathbb{R}^d$ 5, RuLSIF yields the closed-form solution:

$\mathbb{R}^d$ 6

where $\mathbb{R}^d$ 7 is the $\mathbb{R}^d$ 8 identity matrix. The density-ratio estimate at any $\mathbb{R}^d$ 9 is then

$0 \leq \alpha < 1$ 0

Dominant computational costs are the formation of the $0 \leq \alpha < 1$ 1 matrix $0 \leq \alpha < 1$ 2 ( $0 \leq \alpha < 1$ 3) and its inversion ( $0 \leq \alpha < 1$ 4). The number of basis points $0 \leq \alpha < 1$ 5 is often set such that $0 \leq \alpha < 1$ 6 using subsampling or reduced kernel centers. Unlike KLIEP, RuLSIF requires no iterative quadratic programming (QP), resulting in faster computation and greater numerical stability for large-scale problems.

4. Algorithmic Implementation Details

Key components in practical deployment are summarized as follows:

Component	Common Choices / Procedures	Notes
Kernel $0 \leq \alpha < 1$ 7	Gaussian ( $0 \leq \alpha < 1$ 8)	Positive-definite; $0 \leq \alpha < 1$ 9 typically cross-validated
Basis points $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 0	All $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 1-samples or random subset	Subsampling manages computational cost
$p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 2	Small value (e.g., $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 3)	Ensures $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 4
$p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 5, $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 6	K-fold cross-validation	Based on minimizing hold-out $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 7

The computation of $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 8 and $p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),$ 9 is $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 0, and solution of the linear system is $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 1. This approach offers significant advantages in runtime and scalability compared to iterative QP methods.

5. Application: Change-Point Detection in Time Series

RuLSIF is applied to change-point detection in multi-dimensional time series $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 2 as follows:

Two consecutive retrospective windows of length $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 3 are used:
- $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 4
- $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 5
RuLSIF estimates $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 6 and $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 7.
The empirical α-relative PE divergences, using the dual form, are

$r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 8

with an analogous estimate for $r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.$ 9.

The symmetric change-point score is $D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$ 0.
Sliding $D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$ 1 yields a sequence of scores; change-points correspond to peaks in $D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$ 2.

This approach has been successfully validated on artificial and real-world datasets, including domains of human activity sensing, speech, and Twitter data (Liu et al., 2012).

6. Theoretical Properties and Empirical Performance

RuLSIF offers the following properties:

Boundedness: $D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$ 3 for $D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$ 4, providing control over high ratio estimates, unlike $D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$ 5 which may be unbounded.
Convergence Properties: RuLSIF attains improved non-parametric convergence rates compared to uLSIF (the special case $D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,$ 6); see [Yamada et al., NIPS 2013, referenced in (Liu et al., 2012)].
Numerical Stability: The analytic solution leads to lower condition numbers and avoids the difficulties of iterative QP solvers.
Robustness: Experimental results demonstrate that RuLSIF yields higher AUCs in change-point detection tasks than both uLSIF- and KLIEP-based approaches on various types of data, indicating increased empirical robustness (Liu et al., 2012).

7. Summary and Extensions

RuLSIF extends uLSIF by introducing the α-mixture denominator in the density-ratio estimate, which bounds and regularizes the ratio, ensuring stability and improved estimation. It maintains the analytic, non-iterative property of uLSIF, leading to efficient computation. Empirical results on synthetic and real-world data show that RuLSIF enables more accurate and stable non-parametric change-point detection compared to uLSIF and KLIEP. The broader significance of RuLSIF lies in its generality for divergence-based applications where robust and scalable density-ratio estimation is essential (Liu et al., 2012).

Markdown Report Issue Upgrade to Chat

References (1)

Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation (2012)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relative uLSIF (RuLSIF).