Papers
Topics
Authors
Recent
Search
2000 character limit reached

Relative uLSIF (RuLSIF)

Updated 16 June 2026
  • Relative uLSIF (RuLSIF) is a non-parametric method that estimates α-relative density ratios using an α-mixture, ensuring bounded estimates and enhanced numerical stability.
  • It employs kernel expansion and squared-loss minimization with regularization, resulting in an analytic closed-form solution that avoids iterative quadratic programming.
  • RuLSIF is effective in change-point detection for time series, demonstrating improved convergence rates, robustness, and computational efficiency over methods like uLSIF and KLIEP.

Relative uLSIF (RuLSIF) is a non-parametric statistical method for direct estimation of α-relative density ratios and their associated divergences between two probability distributions. RuLSIF generalizes the unconstrained Least-Squares Importance Fitting (uLSIF) method by introducing an α-mixture in the denominator of the density ratio, conferring boundedness and improved numerical stability. It is particularly effective in applications such as change-point detection in time-series settings, where robust, analytic, and sample-efficient divergence estimation is required (Liu et al., 2012).

1. Formal Definition of α-Relative Density Ratio and Relative Pearson Divergence

Let p(x)p(x) and q(x)q(x) be two probability densities on Rd\mathbb{R}^d. For a prescribed parameter 0α<10 \leq \alpha < 1, RuLSIF introduces the α-mixture density

pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),

and defines the α-relative density ratio as

rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.

The associated α-relative Pearson (PE) divergence is then given as:

DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,

quantifying the discrepancy between p(x)p(x) and q(x)q(x) through their α-relative density ratio. For α=0\alpha=0, this reduces to the conventional density-ratio formulation used in uLSIF.

2. Direct Density-Ratio Estimation via Squared-Loss with Regularization

RuLSIF models the unknown q(x)q(x)0 by a kernel expansion:

q(x)q(x)1

where q(x)q(x)2 are basis points—often sampled from q(x)q(x)3 or both q(x)q(x)4 and q(x)q(x)5—and q(x)q(x)6 is a positive-definite kernel, typically Gaussian. The method determines q(x)q(x)7 by minimizing the regularized expected squared error:

q(x)q(x)8

with regularization parameter q(x)q(x)9. Expanding this, the minimization can be expressed as:

Rd\mathbb{R}^d0

where

Rd\mathbb{R}^d1

Empirical estimation replaces expectations with averages over samples Rd\mathbb{R}^d2 and Rd\mathbb{R}^d3:

Rd\mathbb{R}^d4

3. Analytic Solution and Computational Considerations

Setting Rd\mathbb{R}^d5, RuLSIF yields the closed-form solution:

Rd\mathbb{R}^d6

where Rd\mathbb{R}^d7 is the Rd\mathbb{R}^d8 identity matrix. The density-ratio estimate at any Rd\mathbb{R}^d9 is then

0α<10 \leq \alpha < 10

Dominant computational costs are the formation of the 0α<10 \leq \alpha < 11 matrix 0α<10 \leq \alpha < 12 (0α<10 \leq \alpha < 13) and its inversion (0α<10 \leq \alpha < 14). The number of basis points 0α<10 \leq \alpha < 15 is often set such that 0α<10 \leq \alpha < 16 using subsampling or reduced kernel centers. Unlike KLIEP, RuLSIF requires no iterative quadratic programming (QP), resulting in faster computation and greater numerical stability for large-scale problems.

4. Algorithmic Implementation Details

Key components in practical deployment are summarized as follows:

Component Common Choices / Procedures Notes
Kernel 0α<10 \leq \alpha < 17 Gaussian (0α<10 \leq \alpha < 18) Positive-definite; 0α<10 \leq \alpha < 19 typically cross-validated
Basis points pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),0 All pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),1-samples or random subset Subsampling manages computational cost
pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),2 Small value (e.g., pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),3) Ensures pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),4
pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),5, pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),6 K-fold cross-validation Based on minimizing hold-out pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),7

The computation of pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),8 and pα(x)=αp(x)+(1α)q(x),p_\alpha(x) = \alpha\,p(x) + (1-\alpha) q(x),9 is rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.0, and solution of the linear system is rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.1. This approach offers significant advantages in runtime and scalability compared to iterative QP methods.

5. Application: Change-Point Detection in Time Series

RuLSIF is applied to change-point detection in multi-dimensional time series rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.2 as follows:

  1. Two consecutive retrospective windows of length rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.3 are used:
    • rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.4
    • rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.5
  2. RuLSIF estimates rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.6 and rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.7.
  3. The empirical α-relative PE divergences, using the dual form, are

rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.8

with an analogous estimate for rα(x)=p(x)pα(x)=p(x)αp(x)+(1α)q(x).r_\alpha(x) = \frac{p(x)}{p_\alpha(x)} = \frac{p(x)}{\alpha\,p(x) + (1-\alpha) q(x)}.9.

  1. The symmetric change-point score is DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,0.
  2. Sliding DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,1 yields a sequence of scores; change-points correspond to peaks in DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,2.

This approach has been successfully validated on artificial and real-world datasets, including domains of human activity sensing, speech, and Twitter data (Liu et al., 2012).

6. Theoretical Properties and Empirical Performance

RuLSIF offers the following properties:

  • Boundedness: DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,3 for DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,4, providing control over high ratio estimates, unlike DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,5 which may be unbounded.
  • Convergence Properties: RuLSIF attains improved non-parametric convergence rates compared to uLSIF (the special case DPEα(pq)=12pα(x)(rα(x)1)2dx,D_\mathrm{PE}^\alpha(p\|q) = \frac{1}{2} \int p_\alpha(x) \left(r_\alpha(x) - 1\right)^2 dx,6); see [Yamada et al., NIPS 2013, referenced in (Liu et al., 2012)].
  • Numerical Stability: The analytic solution leads to lower condition numbers and avoids the difficulties of iterative QP solvers.
  • Robustness: Experimental results demonstrate that RuLSIF yields higher AUCs in change-point detection tasks than both uLSIF- and KLIEP-based approaches on various types of data, indicating increased empirical robustness (Liu et al., 2012).

7. Summary and Extensions

RuLSIF extends uLSIF by introducing the α-mixture denominator in the density-ratio estimate, which bounds and regularizes the ratio, ensuring stability and improved estimation. It maintains the analytic, non-iterative property of uLSIF, leading to efficient computation. Empirical results on synthetic and real-world data show that RuLSIF enables more accurate and stable non-parametric change-point detection compared to uLSIF and KLIEP. The broader significance of RuLSIF lies in its generality for divergence-based applications where robust and scalable density-ratio estimation is essential (Liu et al., 2012).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relative uLSIF (RuLSIF).