Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive learning of density ratios in RKHS (2307.16164v3)

Published 30 Jul 2023 in cs.LG, math.ST, stat.ML, and stat.TH

Abstract: Estimating the ratio of two probability densities from finitely many observations of the densities is a central problem in machine learning and statistics with applications in two-sample testing, divergence estimation, generative modeling, covariate shift adaptation, conditional density estimation, and novelty detection. In this work, we analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space (RKHS). We derive new finite-sample error bounds, and we propose a Lepskii type parameter choice principle that minimizes the bounds without knowledge of the regularity of the density ratio. In the special case of quadratic loss, our method adaptively achieves a minimax optimal error rate. A numerical illustration is provided.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4(none):384 – 414, 2010.
  2. On regularization algorithms in learning theory. Journal of Complexity, 23(1):52–72, 2007.
  3. Discriminative learning under covariate shift. Journal of Machine Learning Research, 10(9), 2009.
  4. Lucien Birgé. An alternative point of view on lepski’s method. Lecture Notes-Monograph Series, pages 113–133, 2001.
  5. Optimal rates for regularization of statistical inverse learning problems. Foundations of Computational Mathematics, 18:971–1013, 2018.
  6. Lepskii principle in supervised learning. arXiv preprint arXiv:1905.10764, 2019.
  7. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
  8. Cross-validation based adaptation for regularization operators in learning theory. Analysis and Applications, 8(02):161–183, 2010.
  9. Empirical effective dimension and optimal rates for regularized least squares algorithm. Technical report, Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT, 2005.
  10. F. Cucker and S. Smale. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39:1–49, 2001.
  11. Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10(4):455–479, 2010.
  12. Addressing parameter choice issues in unsupervised domain adaptation by aggregation. International Conference on Learning Representations, 2023.
  13. Richard M Dudley. Cambridge studies in advanced mathematics: Real analysis and probability. 74. Cambridge University Press, 2nd edition, 2002.
  14. Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996.
  15. On a regularization of unsupervised domain adaptation in RKHS. Applied and Computational Harmonic Analysis, 57:201–227, 2022.
  16. Adaptive estimation of linear functionals in Hilbert scales from indirect white noise observations. Probability Theory and Related Fields, 118(2):169–186, 2000.
  17. Capacity dependent analysis for functional online learning algorithms. arXiv preprint arXiv:2209.12198, 2022.
  18. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems, 26:309–336, 2011.
  19. A least-squares approach to direct importance estimation. The Journal of Machine Learning Research, 10:1391–1445, 2009.
  20. f𝑓fitalic_f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Transactions on Information Theory, 58(2):708–720, 2011.
  21. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86(3):335–367, 2012.
  22. Learning from positive and unlabeled data with a selection bias. In International Conference on Learning Representations, 2019.
  23. Test of homogeneity in semiparametric two-sample density ratio models. Comptes Rendus Mathématique, 340(12):905–910, 2005.
  24. Theory of point estimation. Springer Science & Business Media, 2006.
  25. OV Lepskii. On a problem of adaptive estimation in gaussian white noise. Theory of Probability & Its Applications, 35(3):454–466, 1991.
  26. Regularization theory for ill-posed problems. In Regularization Theory for Ill-posed Problems. de Gruyter, 2013.
  27. Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48(1):123–148, 2020.
  28. Beyond least-squares: Fast rates for regularized empirical risk minimization through self-concordance. In Conference on Learning Theory, pages 2294–2340. PMLR, 2019.
  29. Peter Mathé. The Lepskii principle revisited. Inverse problems, 22(3):L11, 2006.
  30. Linking losses for density ratio and class-probability estimation. In International Conference on Machine Learning, pages 304–313. PMLR, 2016.
  31. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016.
  32. Nicole Mücke. Adaptivity for regularized kernel methods by Lepskii’s principle. arXiv preprint arXiv:1804.05433, 2018.
  33. Interior-point polynomial algorithms in convex programming. SIAM, 1994.
  34. On regularized Radon-Nikodym differentiation. RICAM-Report 2023-13, 2023.
  35. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization. Advances in Neural Information Processing Systems, 20, 2007.
  36. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.
  37. Inverse density as an inverse problem: The Fredholm equation approach. Advances in Neural Information Processing Systems, 26, 2013.
  38. Generalization properties of learning with random features. Advances in Neural Information Processing Systems, 30, 2017.
  39. A generalized representer theorem. In International Conference on Computational Learning Theory, pages 416–426. Springer, 2001.
  40. Kernel conditional density operators. In International Conference on Artificial Intelligence and Statistics, pages 993–1004. PMLR, 2020.
  41. Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000.
  42. Relative novelty detection. In Artificial Intelligence and Statistics, pages 536–543. PMLR, 2009.
  43. Optimal rates for regularized least squares regression. In Conference on Learning Theory, pages 79–93, 2009.
  44. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007.
  45. Density ratio estimation in machine learning. Cambridge University Press, 2012a.
  46. Density-ratio matching under the bregman divergence: a unified framework of density-ratio estimation. Annals of the Institute of Statistical Mathematics, 64(5):1009–1044, 2012b.
  47. Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  48. Grace Wahba. Spline models for observational data. SIAM, 1990.
  49. The balancing principle for parameter choice in distance-regularized domain adaptation. Advances in Neural Information Processing Systems, 34, 2021.
  50. Tong Zhang. Effective dimension and generalization of kernel learning. Advances in Neural Information Processing Systems, 15, 2002.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Werner Zellinger (19 papers)
  2. Stefan Kindermann (31 papers)
  3. Sergei V. Pereverzyev (6 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.