Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
132 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Overcoming Saturation in Density Ratio Estimation by Iterated Regularization (2402.13891v2)

Published 21 Feb 2024 in cs.LG and stat.ML

Abstract: Estimating the ratio of two probability densities from finitely many samples, is a central task in machine learning and statistics. In this work, we show that a large class of kernel methods for density ratio estimation suffers from error saturation, which prevents algorithms from achieving fast error convergence rates on highly regular learning problems. To resolve saturation, we introduce iterated regularization in density ratio estimation to achieve fast error rates. Our methods outperform its non-iteratively regularized versions on benchmarks for density ratio estimation as well as on large-scale evaluations for importance-weighted ensembling of deep unsupervised domain adaptation models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Francis Bach. Self-concordant analysis for logistic regression. Electronic Journal of Statistics, 4(none):384 – 414, 2010.
  2. On regularization algorithms in learning theory. Journal of Complexity, 23(1):52–72, 2007.
  3. Beyond tikhonov: faster learning with self-concordant losses, via iterative regularization. Advances in Neural Information Processing Systems, 34:28196–28207, 2021.
  4. Discriminative learning under covariate shift. Journal of Machine Learning Research, 10(9), 2009.
  5. Optimal rates for regularization of statistical inverse learning problems. Foundations of Computational Mathematics, 18:971–1013, 2018.
  6. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing, pages 120–128, 2006.
  7. Leo Breiman. Bias, variance and arcing classifiers. Technical report, Statistics Department, University of California, 1996.
  8. Loss functions for binary class probability estimation and classification: Structure and applications. Working draft, November, 3:13, 2005.
  9. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
  10. Empirical effective dimension and optimal rates for regularized least squares algorithm. Technical report, Computer Science and Artificial Intelligence Laboratory (CSAIL), MIT, 2005.
  11. A nonlinear conjugate gradient method with complexity guarantees and its application to nonconvex regression. EURO Journal on Computational Optimization, 10:100044, 2022.
  12. Homm: Higher-order moment matching for unsupervised domain adaptation. Association for the Advancement of Artificial Intelligence (AAAI), 2020.
  13. Featurized density ratio estimation. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 172–182, 2021.
  14. A general framework for consistent structured prediction with implicit loss embeddings. The Journal of Machine Learning Research, 21(1):3852–3918, 2020.
  15. Arthur I Cohen. Rate of convergence of several conjugate gradient algorithms. SIAM Journal on Numerical Analysis, 9(2):248–259, 1972.
  16. Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10(4):455–479, 2010.
  17. Addressing parameter choice issues in unsupervised domain adaptation by aggregation. International Conference on Learning Representations, 2023.
  18. Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996.
  19. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
  20. On a regularization of unsupervised domain adaptation in RKHS. Applied and Computational Harmonic Analysis, 57:201–227, 2022.
  21. Matrix computations. JHU press, 2013.
  22. A survey of nonlinear conjugate gradient methods. Pacific journal of Optimization, 2(1):35–58, 2006.
  23. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  24. Statistical outlier detection using direct density ratio estimation. Knowledge and Information Systems, 26:309–336, 2011.
  25. A least-squares approach to direct importance estimation. The Journal of Machine Learning Research, 10:1391–1445, 2009.
  26. f𝑓fitalic_f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Transactions on Information Theory, 58(2):708–720, 2011.
  27. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86:335–367, 2012a.
  28. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86(3):335–367, 2012b.
  29. Non-negative bregman divergence minimization for deep direct density ratio estimation. In International Conference on Machine Learning, pages 5320–5333. PMLR, 2021.
  30. Learning from positive and unlabeled data with a selection bias. In International Conference on Learning Representations, 2019.
  31. Test of homogeneity in semiparametric two-sample density ratio models. Comptes Rendus Mathématique, 340(12):905–910, 2005.
  32. Approximation of generalized inverses by iterated regularization. Numerical Functional Analysis and Optimization, 1(5):499–513, 1979.
  33. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.
  34. On the saturation effect of kernel ridge regression. In The Eleventh International Conference on Learning Representations, 2022.
  35. Q. Liu and H. Xue. Adversarial spectral kernel matching for unsupervised time series domain adaptation. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 30, 2021.
  36. Conditional adversarial domain adaptation. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
  37. Regularization theory for ill-posed problems. In Regularization Theory for Ill-posed Problems. de Gruyter, 2013.
  38. Globally convergent newton methods for ill-conditioned generalized self-concordant losses. Advances in Neural Information Processing Systems, 32, 2019a.
  39. Beyond least-squares: Fast rates for regularized empirical risk minimization through self-concordance. In Conference on Learning Theory, pages 2294–2340. PMLR, 2019b.
  40. Linking losses for density ratio and class-probability estimation. In International Conference on Machine Learning, pages 304–313. PMLR, 2016.
  41. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016.
  42. Globally linearly convergent nonlinear conjugate gradients without wolfe line search. preprint, 2022.
  43. On regularized Radon-Nikodym differentiation. RICAM-Report 2023-13, 2023.
  44. Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization. Advances in Neural Information Processing Systems, 20, 2007.
  45. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010a. URL http://jmlr.org/papers/v10/bickel09a.html.
  46. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010b.
  47. NIST handbook of mathematical functions. Cambridge University Press, 1 pap/cdr edition, 2010. ISBN 9780521192255.
  48. Finite-sample analysis of M𝑀Mitalic_M-estimators using self-concordance. Electronic Journal of Statistics, 15(1):326 – 391, 2021.
  49. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1406–1415, 2019.
  50. John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
  51. Inverse density as an inverse problem: The fredholm equation approach. Advances in neural information processing systems, 26, 2013.
  52. Adatime: A benchmarking suite for domain adaptation on time series data. ACM Transactions on Knowledge Discovery from Data, 17(8):1–18, 2023.
  53. On minimum discrepancy estimation for deep domain adaptation. Domain Adaptation for Visual Understanding, 2020.
  54. Information, divergence and risk for binary experiments. Journal of Machine Learning Research, 2011.
  55. Composite binary losses. The Journal of Machine Learning Research, 11:2387–2422, 2010.
  56. Telescoping density-ratio estimation. Advances in neural information processing systems, 33:4905–4916, 2020.
  57. R Tyrrell Rockafellar. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898, 1976.
  58. Generalization properties of learning with random features. Advances in Neural Information Processing Systems, 30, 2017.
  59. Learning with kernels. In Learning with kernels. MIT Press, 2002.
  60. Kernel conditional density operators. In International Conference on Artificial Intelligence and Statistics, pages 993–1004. PMLR, 2020.
  61. Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227–244, 2000.
  62. A dirt-t approach to unsupervised domain adaptation. International Conference on Learning Representations (ICLR), 2018.
  63. Relative novelty detection. In Artificial Intelligence and Statistics, pages 536–543. PMLR, 2009.
  64. Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, page 127–140, 2015.
  65. Nuclear feature extraction for breast tumor diagnosis. In Electronic imaging, 1993. URL https://api.semanticscholar.org/CorpusID:14922543.
  66. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007.
  67. Density ratio estimation in machine learning. Cambridge University Press, 2012a.
  68. Density-ratio matching under the bregman divergence: a unified framework of density-ratio estimation. Annals of the Institute of Statistical Mathematics, 64(5):1009–1044, 2012b.
  69. Correlation alignment for unsupervised domain adaptation. Domain Adaptation in Computer Vision Applications, pages 153–171, 2017.
  70. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
  71. Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  72. Vladimir N Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.
  73. Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods, 17(3):261–272, 2020.
  74. Grace Wahba. Spline models for observational data. SIAM, 1990.
  75. Multi-source deep domain adaptation with weak supervision for time-series sensor data. Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), 2020.
  76. Towards accurate model selection in deep unsupervised domain adaptation. In Proceedings of the International Conference on Machine Learning, pages 7124–7133, 2019.
  77. Central moment discrepancy (cmd) for domain-invariant representation learning. International Conference on Learning Representations, 2017.
  78. The balancing principle for parameter choice in distance-regularized domain adaptation. Advances in Neural Information Processing Systems, 34, 2021.
  79. Adaptive learning of density ratios in rkhs. arXiv preprint arXiv:2307.16164, 2023.
  80. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  81. Deep subdomain adaptation network for image classification. IEEE Transactions on Neural Networks and Learning Systems, 32(4):1713–1722, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com