The Minimax Rate of HSIC Estimation for Translation-Invariant Kernels (2403.07735v2)
Abstract: Kernel techniques are among the most influential approaches in data science and statistics. Under mild conditions, the reproducing kernel Hilbert space associated to a kernel is capable of encoding the independence of $M\ge 2$ random variables. Probably the most widespread independence measure relying on kernels is the so-called Hilbert-Schmidt independence criterion (HSIC; also referred to as distance covariance in the statistics literature). Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open. In this work, we prove that the minimax optimal rate of HSIC estimation on $\mathbb Rd$ for Borel measures containing the Gaussians with continuous bounded translation-invariant characteristic kernels is $\mathcal O!\left(n{-1/2}\right)$. Specifically, our result implies the optimality in the minimax sense of many of the most-frequently used estimators (including the U-statistic, the V-statistic, and the Nystr\"om-based one) on $\mathbb Rd$.
- Adaptive test of independence based on HSIC measures. The Annals of Statistics, 50(2):858–879, 2022.
- Data augmentation and transfer learning for brain tumor detection in magnetic resonance imaging. IEEE Access, 10:23217–23233, 2022.
- Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337–404, 1950.
- Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, 2004.
- Tests of mutual or serial independence of random vectors with applications. Journal of Machine Learning Research, 18:1–40, 2017.
- Wind power predictions from nowcasts to 4-hour forecasts: a learning approach with variable selection. Renewable Energy, 211:938–947, 2023.
- Remote sensing feature selection by kernel dependence measures. IEEE Geoscience and Remote Sensing Letters, 7(3):587–591, 2010.
- Vector valued reproducing kernel Hilbert spaces and universality. Analysis and Applications, 8:19–61, 2010.
- Distance metrics for measuring joint dependence with application to causal inference. Journal of the American Statistical Association, 114(528):1638–1650, 2019.
- Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data. Bioinformatics, 35(14):i427–i435, 2019.
- Donald L. Cohn. Measure Theory. Birkhäuser/Springer, second edition, 2013.
- Vector Measures. American Mathematical Society. Providence, 1977.
- John Duchi. Derivations for linear algebra and optimization. Berkeley, California, 3(1):2325–5870, 2007.
- Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links. NeuroImage, 285:120485, 2024.
- Kernel-based sensitivity analysis for (excursion) sets. Technical report, 2023. (https://arxiv.org/abs/2305.09268).
- Sensitivity analysis for ReaxFF reparametrization using the Hilbert–Schmidt independence criterion. Journal of Chemical Theory and Computation, 19(9):2557–2573, 2023.
- Kernel measures of conditional dependence. In Advances in Neural Information Processing Systems (NIPS), pages 498–496, 2008.
- Independence test and canonical correlation analysis based on the alignment between kernel matrices for multivariate functional data. Artificial Intelligence Review, pages 1–25, 2018.
- Measuring statistical dependence with Hilbert-Schmidt norms. In Algorithmic Learning Theory (ALT), pages 63–78, 2005.
- A kernel statistical test of independence. In Advances in Neural Information Processing Systems (NIPS), pages 585–592, 2008.
- A kernel two-sample test. Journal of Machine Learning Research, 13(25):723–773, 2012.
- Estimating extinction time using radiocarbon dates. Quaternary Geochronology, 79:101489, 2024.
- Nyström M-Hilbert-Schmidt independence criterion. In Conference on Uncertainty in Artificial Intelligence (UAI), pages 1005–1015, 2023.
- Lucien Le Cam. Convergence of estimates under dimensionality restrictions. The Annals of Statistics, 1:38–53, 1973.
- Russell Lyons. Distance covariance in metric spaces. The Annals of Probability, 41:3284–3305, 2013.
- Universal kernels. Journal of Machine Learning Research, 7:2651–2667, 2006.
- Distinguishing cause from effect using observational data: Methods and benchmarks. Journal of Machine Learning Research, 17:1–102, 2016.
- Learning from distributions via support measure machines. In Advances in Neural Information Processing Systems (NIPS), pages 10–18, 2011.
- Alfred Müller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29:429–443, 1997.
- Kernel-based tests for joint independence. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(1):5–31, 2018.
- Sequential kernelized independence testing. In International Conference on Machine Learning (ICML), pages 27957–27993, 2023.
- Kernelized sorting. In Advances in Neural Information Processing Systems (NIPS), pages 1289–1296, 2009.
- Theory of Reproducing Kernels and Applications. Springer Singapore, 2016.
- Toward causal representation learning. Proceedings of the IEEE, 109(5):612–634, 2021.
- A kernel test for three-variable interactions. In Advances in Neural Information Processing Systems (NIPS), pages 1124–1132, 2013a.
- Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Annals of Statistics, 41:2263–2291, 2013b.
- A permutation-free kernel independence test. Journal of Machine Learning Research, 24(369):1–68, 2023.
- On distance and kernel measures of conditional independence. Journal of Machine Learning Research, 24(7):1–16, 2023.
- A Hilbert space embedding for distributions. In Algorithmic Learning Theory (ALT), pages 13–31, 2007.
- A dependence maximization view of clustering. In International Conference on Machine Learning (ICML), pages 815–822, 2007.
- Feature selection via dependence maximization. Journal of Machine Learning Research, 13(1):1393–1434, 2012.
- Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517–1561, 2010.
- Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 12:2389–2410, 2011.
- Ingo Steinwart. On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 6(3):67–93, 2001.
- Support Vector Machines. Springer, 2008.
- Optimal uncertainty quantification of a risk measurement from a thermal-hydraulic code using canonical moments. International Journal for Uncertainty Quantification, 10(1), 2020.
- Characteristic and universal tensor product kernels. Journal of Machine Learning Research, 18(233):1–29, 2018.
- Brownian distance covariance. The Annals of Applied Statistics, 3:1236–1265, 2009.
- Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35:2769–2794, 2007.
- Minimax estimation of maximal mean discrepancy with radial kernels. In Advances in Neural Information Processing Systems (NIPS), pages 1930–1938, 2016.
- Minimax estimation of kernel mean embeddings. Journal of Machine Learning Research, 18:1–47, 2017.
- Alexandre B. Tsybakov. Introduction to Nonparametric Estimation. Springer, 2009.
- Sebastien De Veiga. Global sensitivity analysis with dependence measures. Journal of Statistical Computation and Simulation, 85(7):1283–1305, 2015.
- Ranking features to promote diversity: An approach based on sparse distance correlation. Technometrics, 64(3):384–395, 2022.
- Nonparametric independence testing for small sample sizes. In International Joint Conference on Artificial Intelligence (IJCAI), pages 3777–3783, 2015.
- Holger Wendland. Scattered data approximation. Cambridge University Press, 2005.
- High-dimensional feature selection by feature-wise kernelized lasso. Neural Computation, 26(1):185–207, 2014.
- A class of optimal estimators for the covariance operator in reproducing kernel Hilbert spaces. Journal of Multivariate Analysis, 169:166–178, 2019.
- V. Zolotarev. Probability metrics. Theory of Probability and its Applications, 28:278–302, 1983.