Relative uLSIF (RuLSIF)
- Relative uLSIF (RuLSIF) is a non-parametric method that estimates α-relative density ratios using an α-mixture, ensuring bounded estimates and enhanced numerical stability.
- It employs kernel expansion and squared-loss minimization with regularization, resulting in an analytic closed-form solution that avoids iterative quadratic programming.
- RuLSIF is effective in change-point detection for time series, demonstrating improved convergence rates, robustness, and computational efficiency over methods like uLSIF and KLIEP.
Relative uLSIF (RuLSIF) is a non-parametric statistical method for direct estimation of α-relative density ratios and their associated divergences between two probability distributions. RuLSIF generalizes the unconstrained Least-Squares Importance Fitting (uLSIF) method by introducing an α-mixture in the denominator of the density ratio, conferring boundedness and improved numerical stability. It is particularly effective in applications such as change-point detection in time-series settings, where robust, analytic, and sample-efficient divergence estimation is required (Liu et al., 2012).
1. Formal Definition of α-Relative Density Ratio and Relative Pearson Divergence
Let and be two probability densities on . For a prescribed parameter , RuLSIF introduces the α-mixture density
and defines the α-relative density ratio as
The associated α-relative Pearson (PE) divergence is then given as:
quantifying the discrepancy between and through their α-relative density ratio. For , this reduces to the conventional density-ratio formulation used in uLSIF.
2. Direct Density-Ratio Estimation via Squared-Loss with Regularization
RuLSIF models the unknown 0 by a kernel expansion:
1
where 2 are basis points—often sampled from 3 or both 4 and 5—and 6 is a positive-definite kernel, typically Gaussian. The method determines 7 by minimizing the regularized expected squared error:
8
with regularization parameter 9. Expanding this, the minimization can be expressed as:
0
where
1
Empirical estimation replaces expectations with averages over samples 2 and 3:
4
3. Analytic Solution and Computational Considerations
Setting 5, RuLSIF yields the closed-form solution:
6
where 7 is the 8 identity matrix. The density-ratio estimate at any 9 is then
0
Dominant computational costs are the formation of the 1 matrix 2 (3) and its inversion (4). The number of basis points 5 is often set such that 6 using subsampling or reduced kernel centers. Unlike KLIEP, RuLSIF requires no iterative quadratic programming (QP), resulting in faster computation and greater numerical stability for large-scale problems.
4. Algorithmic Implementation Details
Key components in practical deployment are summarized as follows:
| Component | Common Choices / Procedures | Notes |
|---|---|---|
| Kernel 7 | Gaussian (8) | Positive-definite; 9 typically cross-validated |
| Basis points 0 | All 1-samples or random subset | Subsampling manages computational cost |
| 2 | Small value (e.g., 3) | Ensures 4 |
| 5, 6 | K-fold cross-validation | Based on minimizing hold-out 7 |
The computation of 8 and 9 is 0, and solution of the linear system is 1. This approach offers significant advantages in runtime and scalability compared to iterative QP methods.
5. Application: Change-Point Detection in Time Series
RuLSIF is applied to change-point detection in multi-dimensional time series 2 as follows:
- Two consecutive retrospective windows of length 3 are used:
- 4
- 5
- RuLSIF estimates 6 and 7.
- The empirical α-relative PE divergences, using the dual form, are
8
with an analogous estimate for 9.
- The symmetric change-point score is 0.
- Sliding 1 yields a sequence of scores; change-points correspond to peaks in 2.
This approach has been successfully validated on artificial and real-world datasets, including domains of human activity sensing, speech, and Twitter data (Liu et al., 2012).
6. Theoretical Properties and Empirical Performance
RuLSIF offers the following properties:
- Boundedness: 3 for 4, providing control over high ratio estimates, unlike 5 which may be unbounded.
- Convergence Properties: RuLSIF attains improved non-parametric convergence rates compared to uLSIF (the special case 6); see [Yamada et al., NIPS 2013, referenced in (Liu et al., 2012)].
- Numerical Stability: The analytic solution leads to lower condition numbers and avoids the difficulties of iterative QP solvers.
- Robustness: Experimental results demonstrate that RuLSIF yields higher AUCs in change-point detection tasks than both uLSIF- and KLIEP-based approaches on various types of data, indicating increased empirical robustness (Liu et al., 2012).
7. Summary and Extensions
RuLSIF extends uLSIF by introducing the α-mixture denominator in the density-ratio estimate, which bounds and regularizes the ratio, ensuring stability and improved estimation. It maintains the analytic, non-iterative property of uLSIF, leading to efficient computation. Empirical results on synthetic and real-world data show that RuLSIF enables more accurate and stable non-parametric change-point detection compared to uLSIF and KLIEP. The broader significance of RuLSIF lies in its generality for divergence-based applications where robust and scalable density-ratio estimation is essential (Liu et al., 2012).