Minimax Optimal Kernel Two-Sample Tests with Random Features (2502.20755v1)

Published 28 Feb 2025 in math.ST, cs.LG, stat.ML, and stat.TH

Abstract: Reproducing Kernel Hilbert Space (RKHS) embedding of probability distributions has proved to be an effective approach, via MMD (maximum mean discrepancy) for nonparametric hypothesis testing problems involving distributions defined over general (non-Euclidean) domains. While a substantial amount of work has been done on this topic, only recently, minimax optimal two-sample tests have been constructed that incorporate, unlike MMD, both the mean element and a regularized version of the covariance operator. However, as with most kernel algorithms, the computational complexity of the optimal test scales cubically in the sample size, limiting its applicability. In this paper, we propose a spectral regularized two-sample test based on random Fourier feature (RFF) approximation and investigate the trade-offs between statistical optimality and computational efficiency. We show the proposed test to be minimax optimal if the approximation order of RFF (which depends on the smoothness of the likelihood ratio and the decay rate of the eigenvalues of the integral operator) is sufficiently large. We develop a practically implementable permutation-based version of the proposed test with a data-adaptive strategy for selecting the regularization parameter and the kernel. Finally, through numerical experiments on simulated and benchmark datasets, we demonstrate that the proposed RFF-based test is computationally efficient and performs almost similar (with a small drop in power) to the exact test.

Summary

Overview of "Minimax Optimal Kernel Two-Sample Tests with Random Features"

The paper "Minimax Optimal Kernel Two-Sample Tests with Random Features" by Soumya Mukherjee and Bharath K. Sriperumbudur presents a novel approach to enhance the computational feasibility of minimax optimal kernel-based two-sample tests using Random Fourier Features (RFF). The work focuses on balancing computational efficiency and statistical performance, a critical trade-off in high-dimensional data analysis.

In kernel two-sample tests, the Maximum Mean Discrepancy (MMD) is a popular nonparametric approach within Reproducing Kernel Hilbert Spaces (RKHS). However, traditional MMD tests do not achieve minimax optimality due to their insensitivity to certain distributional discrepancies. Recent advancements incorporate both mean embedding and a regularized version of the covariance operator, achieving minimax optimality but at a significant computational cost. The authors aim to reduce this cost by employing RFF approximations, which historically have facilitated efficient kernel computations in large-scale problems.

Key contributions of this research include:

Computational Efficiency: The paper introduces an RFF-based spectral regularized two-sample test that reduces the cubic complexity of traditional methods to a more manageable level, making it suitable for large datasets. This is achieved by approximating the kernel function through random sampling, which is pivotal for scalable applications.
Trade-off Analysis: A comprehensive theoretical analysis is provided, detailing the conditions under which the RFF-based test retains minimax optimality. These conditions are contingent upon the approximation order of RFF, dictated by the smoothness of the likelihood ratio and the decay rate of the integral operator's eigenvalues.
Adaptive Implementation: The authors propose a permutation-based implementation that adapts to the data, dynamically selecting the regularization parameter and kernel. This adds practical value by obviating the need for manual tuning, which is often dataset-specific.
Empirical Validation: The paper demonstrates the RFF-based test on synthetic and benchmark datasets, showing comparable statistical performance to exact tests, albeit with a minor reduction in power. The empirical studies affirm the method's computational advantages across various scenarios.

The paper also discusses distinctions from related works, particularly emphasizing the flexibility of their kernel assumptions and characterizations of alternatives. Unlike Choi and Kim (2024), which impose kernel restrictions to achieve theoretical results, Mukherjee and Sriperumbudur's method supports a broader class of kernels and divergence structures.

Theoretical implications include an expansion of the function space over which minimax optimality is sought, using a range of fractional power of an integral operator. Practically, the RFF approach presents a viable solution for domains where computational resources are limited, and large sample sizes are necessary for robust statistical inference.

Looking forward, the authors suggest extending the work to explore alternative approximation techniques such as the Nyström method and examining their statistical and computational trade-offs. Moreover, the adaptability of their test to various kernel configurations offers potential for further exploration in diverse application domains. This research significantly contributes to the field of statistical learning by improving the feasibility of theoretically robust procedures in practical settings.

Related Papers

Tweets

https://twitter.com/_onionesque/status/1896456486475440271