Adaptive learning of density ratios in RKHS (2307.16164v3)
Abstract: Estimating the ratio of two probability densities from finitely many observations of the densities is a central problem in machine learning and statistics with applications in two-sample testing, divergence estimation, generative modeling, covariate shift adaptation, conditional density estimation, and novelty detection. In this work, we analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space (RKHS). We derive new finite-sample error bounds, and we propose a Lepskii type parameter choice principle that minimizes the bounds without knowledge of the regularity of the density ratio. In the special case of quadratic loss, our method adaptively achieves a minimax optimal error rate. A numerical illustration is provided.
- Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4(none):384 – 414, 2010.
- On regularization algorithms in learning theory. Journal of Complexity, 23(1):52–72, 2007.
- Discriminative learning under covariate shift. Journal of Machine Learning Research, 10(9), 2009.
- Lucien Birgé. An alternative point of view on lepski’s method. Lecture Notes-Monograph Series, pages 113–133, 2001.
- Optimal rates for regularization of statistical inverse learning problems. Foundations of Computational Mathematics, 18:971–1013, 2018.
- Lepskii principle in supervised learning. arXiv preprint arXiv:1905.10764, 2019.
- Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
- Cross-validation based adaptation for regularization operators in learning theory. Analysis and Applications, 8(02):161–183, 2010.
- Empirical effective dimension and optimal rates for regularized least squares algorithm. Technical report, Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT, 2005.
- F. Cucker and S. Smale. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39:1–49, 2001.
- Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10(4):455–479, 2010.
- Addressing parameter choice issues in unsupervised domain adaptation by aggregation. International Conference on Learning Representations, 2023.
- Richard M Dudley. Cambridge studies in advanced mathematics: Real analysis and probability. 74. Cambridge University Press, 2nd edition, 2002.
- Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996.
- On a regularization of unsupervised domain adaptation in RKHS. Applied and Computational Harmonic Analysis, 57:201–227, 2022.
- Adaptive estimation of linear functionals in Hilbert scales from indirect white noise observations. Probability Theory and Related Fields, 118(2):169–186, 2000.
- Capacity dependent analysis for functional online learning algorithms. arXiv preprint arXiv:2209.12198, 2022.
- Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems, 26:309–336, 2011.
- A least-squares approach to direct importance estimation. The Journal of Machine Learning Research, 10:1391–1445, 2009.
- f𝑓fitalic_f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Transactions on Information Theory, 58(2):708–720, 2011.
- Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86(3):335–367, 2012.
- Learning from positive and unlabeled data with a selection bias. In International Conference on Learning Representations, 2019.
- Test of homogeneity in semiparametric two-sample density ratio models. Comptes Rendus Mathématique, 340(12):905–910, 2005.
- Theory of point estimation. Springer Science & Business Media, 2006.
- OV Lepskii. On a problem of adaptive estimation in gaussian white noise. Theory of Probability & Its Applications, 35(3):454–466, 1991.
- Regularization theory for ill-posed problems. In Regularization Theory for Ill-posed Problems. de Gruyter, 2013.
- Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48(1):123–148, 2020.
- Beyond least-squares: Fast rates for regularized empirical risk minimization through self-concordance. In Conference on Learning Theory, pages 2294–2340. PMLR, 2019.
- Peter Mathé. The Lepskii principle revisited. Inverse problems, 22(3):L11, 2006.
- Linking losses for density ratio and class-probability estimation. In International Conference on Machine Learning, pages 304–313. PMLR, 2016.
- Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016.
- Nicole Mücke. Adaptivity for regularized kernel methods by Lepskii’s principle. arXiv preprint arXiv:1804.05433, 2018.
- Interior-point polynomial algorithms in convex programming. SIAM, 1994.
- On regularized Radon-Nikodym differentiation. RICAM-Report 2023-13, 2023.
- Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization. Advances in Neural Information Processing Systems, 20, 2007.
- Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.
- Inverse density as an inverse problem: The Fredholm equation approach. Advances in Neural Information Processing Systems, 26, 2013.
- Generalization properties of learning with random features. Advances in Neural Information Processing Systems, 30, 2017.
- A generalized representer theorem. In International Conference on Computational Learning Theory, pages 416–426. Springer, 2001.
- Kernel conditional density operators. In International Conference on Artificial Intelligence and Statistics, pages 993–1004. PMLR, 2020.
- Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000.
- Relative novelty detection. In Artificial Intelligence and Statistics, pages 536–543. PMLR, 2009.
- Optimal rates for regularized least squares regression. In Conference on Learning Theory, pages 79–93, 2009.
- Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007.
- Density ratio estimation in machine learning. Cambridge University Press, 2012a.
- Density-ratio matching under the bregman divergence: a unified framework of density-ratio estimation. Annals of the Institute of Statistical Mathematics, 64(5):1009–1044, 2012b.
- Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- Grace Wahba. Spline models for observational data. SIAM, 1990.
- The balancing principle for parameter choice in distance-regularized domain adaptation. Advances in Neural Information Processing Systems, 34, 2021.
- Tong Zhang. Effective dimension and generalization of kernel learning. Advances in Neural Information Processing Systems, 15, 2002.
- Werner Zellinger (19 papers)
- Stefan Kindermann (31 papers)
- Sergei V. Pereverzyev (6 papers)