Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Short Survey on Importance Weighting for Machine Learning (2403.10175v2)

Published 15 Mar 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (201)
  1. Positive unlabeled contrastive learning. arXiv preprint arXiv:2206.01206, 2022.
  2. Minimax regret optimization for robust machine learning under distribution shift. In Conference on Learning Theory, pages 2704–2729. PMLR, 2022.
  3. Para-active learning. arXiv preprint arXiv:1310.8243, 2013.
  4. Contextual diversity for active learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pages 137–153. Springer, 2020.
  5. Distributionally robust domain adaptation. arXiv preprint arXiv:2210.16894, 2022.
  6. Regularized learning for domain adaptation under label shifts. arXiv preprint arXiv:1903.09734, 2019.
  7. Francis Bach. Active learning for misspecified generalized linear models. Advances in neural information processing systems, 19, 2006.
  8. Learning from positive and unlabeled data: A survey. Machine Learning, 109:719–760, 2020.
  9. Analysis of representations for domain adaptation. Advances in neural information processing systems, 19, 2006.
  10. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
  11. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.
  12. Richard A Berk. An introduction to sample selection bias in sociological data. American sociological review, pages 386–398, 1983.
  13. Data-driven robust optimization. Mathematical Programming, 167:235–292, 2018.
  14. Adaptive distributionally robust optimization. Management Science, 65(2):604–618, 2019.
  15. The future of distributed models: model calibration and uncertainty prediction. Hydrological processes, 6(3):279–298, 1992.
  16. Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning, pages 49–56, 2009.
  17. Efficient active learning. In ICML 2011 Workshop on On-line Trading of Exploration and Exploitation, 2011.
  18. Pattern recognition and machine learning, volume 4. Springer, 2006.
  19. Robust wasserstein profile inference and applications to machine learning. Journal of Applied Probability, 56(3):830–857, 2019.
  20. Klaus Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of the 20th international conference on machine learning (ICML-03), pages 59–66, 2003.
  21. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  22. What is the effect of importance weighting in deep learning? In International conference on machine learning, pages 872–881. PMLR, 2019.
  23. Maximizing expected model change for active learning in regression. In 2013 IEEE 13th international conference on data mining, pages 51–60. IEEE, 2013.
  24. Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018a.
  25. Partial adversarial domain adaptation. In Proceedings of the European conference on computer vision (ECCV), pages 135–150, 2018b.
  26. Learning to transfer examples for partial domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019a.
  27. Learning to transfer examples for partial domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2985–2994, 2019b.
  28. A survey on adversarial attacks and defences. CAAI Transactions on Intelligence Technology, 6(1):25–45, 2021.
  29. Word sense disambiguation with distribution estimation. In IJCAI, volume 5, pages 1010–5, 2005.
  30. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109, 2023.
  31. Olivier Chapelle. Modeling delayed feedback in display advertising. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1097–1105, 2014.
  32. On focal loss for class-posterior probability estimation: A theoretical perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5202–5211, 2021.
  33. Distributed active learning with application to battery health management. In 14th International conference on information fusion, pages 1–7. IEEE, 2011.
  34. Robust covariate shift regression. In Artificial Intelligence and Statistics, pages 1270–1279. PMLR, 2016.
  35. Cost-sensitive positive and unlabeled learning. Information Sciences, 558:229–245, 2021.
  36. Featurized density ratio estimation. In Uncertainty in Artificial Intelligence, pages 172–182. PMLR, 2021.
  37. Learning with confident examples: Rank pruning for robust classification with noisy labels. 2017.
  38. Learning bounds for importance weighting. Advances in neural information processing systems, 23, 2010.
  39. Generative adversarial networks: An overview. IEEE signal processing magazine, 35(1):53–65, 2018.
  40. Gabriela Csurka. Domain adaptation for visual applications: A comprehensive survey. arXiv preprint arXiv:1702.05374, 2017.
  41. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277, 2019.
  42. Adversarial weighting for domain adaptation in regression. In 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), pages 49–56. IEEE, 2021.
  43. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations research, 58(3):595–612, 2010.
  44. Exploring representativeness and informativeness for active learning. IEEE transactions on cybernetics, 47(1):14–26, 2015.
  45. Convex formulation for learning from positive and unlabeled data. In International conference on machine learning, pages 1386–1394. PMLR, 2015.
  46. Analysis of learning from positive and unlabeled data. Advances in neural information processing systems, 27, 2014.
  47. Learning models with uniform performance via distributionally robust optimization. arXiv preprint arXiv:1810.08750, 2018.
  48. Learning models with uniform performance via distributionally robust optimization. The Annals of Statistics, 49(3):1378–1406, 2021.
  49. Automatic model calibration. Hydrological Processes: An International Journal, 19(3):651–658, 2005.
  50. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, 2008.
  51. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. arXiv preprint arXiv:1505.05116, 2015.
  52. Rethinking importance weighting for deep learning under distribution shift. Advances in neural information processing systems, 33:11996–12007, 2020.
  53. A brief review of domain adaptation. Advances in data science and information engineering: proceedings from ICDATA 2020 and IKE 2020, pages 877–894, 2021.
  54. On statistical bias in active learning: How and when to fix it. arXiv preprint arXiv:2101.11665, 2021.
  55. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
  56. Selecting influential examples: Active learning with expected model output changes. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13, pages 562–577. Springer, 2014.
  57. Learning to detect open classes for universal domain adaptation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16, pages 567–583. Springer, 2020.
  58. Domain adaptation under open set label shift. Advances in Neural Information Processing Systems, 35:22531–22546, 2022.
  59. Adafocal: Calibration-aware adaptive focal loss. Advances in Neural Information Processing Systems, 35:1583–1595, 2022.
  60. When is importance weighting correction needed for covariate shift adaptation? arXiv preprint arXiv:2303.04020, 2023.
  61. Distributionally robust optimization and its tractable approximations. Operations research, 58(4-part-1):902–917, 2010.
  62. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  63. Automated curriculum learning for neural networks. In international conference on machine learning, pages 1311–1320. Pmlr, 2017.
  64. Covariate shift by kernel mean matching. Dataset shift in machine learning, 3(4):5, 2009.
  65. Entropy weight allocation: Positive-unlabeled learning via optimal transport. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), pages 37–45. SIAM, 2022.
  66. Simple black-box adversarial attacks. In International Conference on Machine Learning, pages 2484–2493. PMLR, 2019.
  67. Model calibration and uncertainty estimation. Encyclopedia of hydrological sciences, 2006.
  68. On the power of curriculum learning in training deep networks. In International conference on machine learning, pages 2535–2544. PMLR, 2019.
  69. Masking: A new perspective of noisy supervision. Advances in neural information processing systems, 31, 2018a.
  70. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in neural information processing systems, 31, 2018b.
  71. Umix: Improving importance weighting for subpopulation shift via uncertainty-aware mixup. Advances in Neural Information Processing Systems, 35:37704–37718, 2022.
  72. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  73. Mary C Hill. Methods and guidelines for effective model calibration. In Building partnerships, pages 1–10. 2000.
  74. Hideitsu Hino. Active learning: Problem settings and recent developments. arXiv preprint arXiv:2012.04225, 2020.
  75. Learning sample reweighting for accuracy and adversarial robustness. arXiv preprint arXiv:2210.11513, 2022.
  76. Sleep well: a sound sleep monitoring framework for community scaling. In 2015 16th IEEE International Conference on Mobile Data Management, volume 1, pages 44–53. IEEE, 2015.
  77. Active learning by learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
  78. Correcting sample selection bias by unlabeled data. Advances in neural information processing systems, 19, 2006.
  79. Adversarial attacks on neural network policies. arXiv preprint arXiv:1702.02284, 2017.
  80. Active learning by querying informative and representative examples. Advances in neural information processing systems, 23, 2010.
  81. An introduction to statistical learning, volume 112. Springer, 2013.
  82. Gradient descent aligns the layers of deep linear networks. arXiv preprint arXiv:1810.02032, 2018a.
  83. Risk and parameter convergence of logistic regression. arXiv preprint arXiv:1803.07300, 2018b.
  84. Instance weighting for domain adaptation in nlp. ACL, 2007.
  85. Online learning under delayed feedback. In International Conference on Machine Learning, pages 1453–1461. PMLR, 2013.
  86. Graph-based active learning: A new look at expected error minimization. In 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1325–1329. IEEE, 2016.
  87. Large-scale active learning with approximations of expected model output changes. In Pattern Recognition: 38th German Conference, GCPR 2016, Hannover, Germany, September 12-15, 2016, Proceedings 38, pages 179–191. Springer, 2016.
  88. Estimation of particle transmission by random sampling. National Bureau of Standards applied mathematics series, 12:27–30, 1951.
  89. A least-squares approach to direct importance estimation. The Journal of Machine Learning Research, 10:1391–1445, 2009.
  90. Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
  91. Masanari Kimura. Why mixup improves the model performance. In International Conference on Artificial Neural Networks, pages 275–286. Springer, 2021.
  92. Information geometrically generalized covariate shift adaptation. Neural Computation, 34(9):1944–1977, 2022.
  93. Batch prioritization of data labeling tasks for training classifiers. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 8, pages 163–167, 2020.
  94. Positive-unlabeled learning with non-negative risk estimator. Advances in neural information processing systems, 30, 2017.
  95. Active testing: Sample-efficient model evaluation. In International Conference on Machine Learning, pages 5753–5763. PMLR, 2021.
  96. Limitations of assessing active learning performance at runtime. arXiv preprint arXiv:1901.10338, 2019.
  97. On regularization parameter estimation under covariate shift. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 426–431. IEEE, 2016.
  98. Addressing delayed feedback for continuous training with neural networks in ctr prediction. In Proceedings of the 13th ACM conference on recommender systems, pages 187–195, 2019.
  99. Meta-learning for relative density-ratio estimation. Advances in Neural Information Processing Systems, 34:30426–30438, 2021.
  100. Towards inheritable models for open-set domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12376–12385, 2020.
  101. Instance weighting domain adaptation using distance kernel. Industrial Engineering & Management Systems, 17(2):334–340, 2018.
  102. Large-scale methods for distributionally robust optimization. Advances in Neural Information Processing Systems, 33:8847–8860, 2020.
  103. Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994, pages 148–156. Elsevier, 1994.
  104. Confidence-based active learning. IEEE transactions on pattern analysis and machine intelligence, 28(8):1251–1261, 2006.
  105. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Advances in Neural Information Processing Systems, 33:21002–21012, 2020.
  106. A balanced and uncertainty-aware approach for partial domain adaptation. In European conference on computer vision, pages 123–140. Springer, 2020.
  107. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  108. Detecting and correcting for label shift with black box predictors. In International conference on machine learning, pages 3122–3130. PMLR, 2018.
  109. Robust classification under sample selection bias. Advances in neural information processing systems, 27, 2014.
  110. Robust covariate shift prediction with general losses and feature views. arXiv preprint arXiv:1712.10043, 2017.
  111. Building text classifiers using positive and unlabeled examples. In Third IEEE international conference on data mining, pages 179–186. IEEE, 2003.
  112. Adversarial focal loss: Asking your discriminator for hard examples. arXiv preprint arXiv:2207.07739, 2022.
  113. Separate to adapt: Open set domain adaptation via progressive separation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2927–2936, 2019.
  114. Trimmed density ratio estimation. Advances in neural information processing systems, 30, 2017.
  115. Class confidence weighted k nn algorithms for imbalanced data sets. In Advances in Knowledge Discovery and Data Mining: 15th Pacific-Asia Conference, PAKDD 2011, Shenzhen, China, May 24-27, 2011, Proceedings, Part II 15, pages 345–356. Springer, 2011.
  116. Conditional adversarial domain adaptation. Advances in neural information processing systems, 31, 2018.
  117. Noise attention learning: Enhancing noise robustness by gradient scaling. Advances in Neural Information Processing Systems, 35:23164–23177, 2022a.
  118. Importance tempering: Group robustness for overparameterized models. arXiv preprint arXiv:2209.08745, 2022b.
  119. Aleksandr Luntz. On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 1969.
  120. Double-weighting for covariate shift adaptation. In International Conference on Machine Learning, pages 30439–30457. PMLR, 2023.
  121. Nima Mashayekhi. An Adversarial Approach to Importance Weighting for Domain Adaptation. PhD thesis, 2022.
  122. Active learning for classifying data streams with unknown number of classes. Neural Networks, 98:1–15, 2018.
  123. Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems, 33:15288–15299, 2020.
  124. Effectiveness of focal loss for minority classification in network intrusion detection systems. Symmetry, 13(1):4, 2020.
  125. Active learning with expected error reduction. arXiv preprint arXiv:2211.09283, 2022.
  126. Lexicographic and depth-sensitive margins in homogeneous and non-homogeneous deep models. In International Conference on Machine Learning, pages 4683–4692. PMLR, 2019.
  127. Necessary and sufficient hypothesis of curvature: Understanding connection between out-of-distribution generalization and calibration. ICLR workshop on Domain Generalization, 2023.
  128. Learning with noisy labels. Advances in neural information processing systems, 26, 2013.
  129. Continuous target shift adaptation in supervised learning. In Asian Conference on Machine Learning, pages 285–300. PMLR, 2016.
  130. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.
  131. Label noise correction methods. In 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 1–9. IEEE, 2015.
  132. Label noise correction and application in crowdsourcing. Expert Systems with Applications, 66:149–162, 2016.
  133. Art Owen and Yi Zhou. Safe and effective importance sampling. journal of the American Statistical Association, pages 135–143, 2000.
  134. Open set domain adaptation. In Proceedings of the IEEE international conference on computer vision, pages 754–763, 2017.
  135. Visual domain adaptation: A survey of recent advances. IEEE signal processing magazine, 32(3):53–69, 2015.
  136. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1944–1952, 2017.
  137. Suppressing mislabeled data via grouping and self-attention. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pages 786–802. Springer, 2020.
  138. Bandits with delayed, aggregated anonymous feedback. In International Conference on Machine Learning, pages 4105–4113. PMLR, 2018.
  139. Importance weighting and unsupervised domain adaptation of pos taggers: a negative result. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 968–973, 2014.
  140. Attentional-biased stochastic gradient descent. Transactions on Machine Learning Research, 2022.
  141. Distributionally robust optimization: A review. arXiv preprint arXiv:1908.05659, 2019.
  142. Learning to reweight examples for robust deep learning. In International conference on machine learning, pages 4334–4343. PMLR, 2018.
  143. Telescoping density-ratio estimation. Advances in neural information processing systems, 33:4905–4916, 2020.
  144. Display advertising: Estimating conversion probability efficiently. arXiv preprint arXiv:1710.08583, 2017.
  145. Breeds: Benchmarks for subpopulation shift. arXiv preprint arXiv:2008.04859, 2020.
  146. Burr Settles. Active learning literature survey. 2009.
  147. Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000.
  148. Leslie N Smith. Cyclical focal loss. arXiv preprint arXiv:2202.08978, 2022.
  149. Learning with kernels, volume 4. Citeseer, 1998.
  150. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  151. The implicit bias of gradient descent on separable data. The Journal of Machine Learning Research, 19(1):2822–2878, 2018.
  152. Active adversarial domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 739–748, 2020.
  153. Masashi Sugiyama. Active learning for misspecified models. Advances in neural information processing systems, 18, 2005.
  154. Masashi Sugiyama. Introduction to statistical machine learning. Morgan Kaufmann, 2015.
  155. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007a.
  156. Direct importance estimation with model selection and its application to covariate shift adaptation. Advances in neural information processing systems, 20, 2007b.
  157. A two-stage weighting framework for multi-source domain adaptation. Advances in neural information processing systems, 24, 2011.
  158. Reacting to variations in product demand: An application for conversion rate (cr) prediction in sponsored search. In 2018 IEEE International Conference on Big Data (Big Data), pages 1856–1864. IEEE, 2018.
  159. Dual focal loss for calibration. arXiv preprint arXiv:2305.13665, 2023.
  160. What makes for good views for contrastive learning? Advances in neural information processing systems, 33:6827–6839, 2020.
  161. Importance sampling: a review. Wiley Interdisciplinary Reviews: Computational Statistics, 2(1):54–60, 2010.
  162. Katrin Tomanek. Resource-aware annotation through active learning. 2010.
  163. Inspecting sample reusability for active learning. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pages 169–181. JMLR Workshop and Conference Proceedings, 2011.
  164. Improving accuracy of lung nodule classification using deep learning with focal loss. Journal of healthcare engineering, 2019, 2019.
  165. Van-Tinh Tran. Selection bias correction in supervised learning with importance weight. PhD thesis, Université de Lyon, 2017.
  166. Correcting a class of complete selection bias with external data based on importance weight estimation. In International Conference on Neural Information Processing, pages 111–118. Springer, 2015.
  167. Adversarial discriminative domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7167–7176, 2017.
  168. Evaluating model calibration in classification. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3459–3467. PMLR, 2019.
  169. Gijs Van Tulder. Sample reusability in importance-weighted active learning. 2012.
  170. Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999a.
  171. Vladimir Vapnik. An overview of statistical learning theory. IEEE transactions on neural networks, 10(5):988–999, 1999b.
  172. Francis Vella. Estimating models with sample selection bias: a survey. Journal of Human Resources, pages 127–169, 1998.
  173. Stochastic bandit models for delayed conversions. arXiv preprint arXiv:1706.09186, 2017.
  174. Deep visual domain adaptation: A survey. Neurocomputing, 312:135–153, 2018.
  175. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2021.
  176. Domain aggregation networks for multi-source domain adaptation. In International conference on machine learning, pages 10214–10224. PMLR, 2020.
  177. A survey of unsupervised deep domain adaptation. ACM Transactions on Intelligent Systems and Technology (TIST), 11(5):1–46, 2020.
  178. Models for sample selection bias. Annual review of sociology, 18(1):327–350, 1992.
  179. Redal: Region-based and diversity-aware active learning for point cloud semantic segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 15510–15519, 2021.
  180. Active learning with optimal distribution for image classification. In 2011 International Conference on Multimedia Technology, pages 132–136. IEEE, 2011.
  181. Instance weighting for domain adaptation via trading off sample selection bias and variance. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pages 13–19, 2018.
  182. Are anchor points really indispensable in label-noise learning? Advances in neural information processing systems, 32, 2019.
  183. Ni Xiao and Lei Zhang. Dynamic weighted learning for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15242–15251, 2021.
  184. Understanding the role of importance weighting for deep learning. arXiv preprint arXiv:2103.15209, 2021.
  185. Adversarial attacks and defenses in images, graphs and text: A review. International Journal of Automation and Computing, 17:151–178, 2020.
  186. Relative density-ratio estimation for robust distribution comparison. Neural computation, 25(5):1324–1370, 2013.
  187. Active learning using uncertainty information. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 2646–2651. IEEE, 2016.
  188. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, 113:113–127, 2015.
  189. Change is hard: A closer look at subpopulation shift. arXiv preprint arXiv:2302.12254, 2023.
  190. A feedback shift correction in predicting conversion rates under delayed feedback. In Proceedings of The Web Conference 2020, pages 2740–2746, 2020.
  191. Jiayang Yin. On the Improvement of Density Ratio Estimation via Probabilistic Classifier–Theoretical Study and Its Applications. PhD thesis, The University of British Columbia (Vancouver, 2023.
  192. A nonparametric delayed feedback model for conversion rate prediction. arXiv preprint arXiv:1802.00255, 2018.
  193. Universal domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  194. Bianca Zadrozny. Learning and evaluating classifiers under sample selection bias. In Proceedings of the twenty-first international conference on Machine learning, page 114, 2004.
  195. mixup: Beyond empirical risk minimization. In International Conference on Learning Representations, 2018a.
  196. Importance weighted adversarial nets for partial domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018b.
  197. Domain adaptation under target and conditional shift. In International Conference on Machine Learning, pages 819–827. PMLR, 2013.
  198. How does mixup help with robustness and generalization? In The Ninth International Conference on Learning Representations, 2021.
  199. Active learning under label shift. In International Conference on Artificial Intelligence and Statistics, pages 3412–3420. PMLR, 2021.
  200. Importance-weighted label prediction for active learning with noisy annotations. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pages 3476–3479. IEEE, 2012.
  201. Active learning with sampling by uncertainty and density for data annotations. IEEE Transactions on audio, speech, and language processing, 18(6):1323–1331, 2009.
Citations (4)

Summary

  • The paper provides a comprehensive survey on importance weighting, elucidating its role in correcting distribution shifts in machine learning.
  • It details methodologies like Kernel Mean Matching and Least-Squares Importance Fitting for accurate density ratio estimation under covariate shift.
  • The survey highlights applications in domain adaptation and robust optimization, offering actionable insights for addressing real-world data challenges.

Importance Weighting in Machine Learning: An Overview

The paper "A Short Survey on Importance Weighting for Machine Learning" by Masanari Kimura and Hideitsu Hino provides a comprehensive exploration of importance weighting within the context of machine learning. By exploring its foundational principles and exploring its diverse applications, the authors elucidate the method's pivotal role across various learning paradigms, particularly in addressing issues related to dataset and distribution shifts.

Importance weighting is a statistical technique enhancing the training objective function or probability distribution by assigning weights according to the importance of individual instances. This method addresses the challenge posed by distribution shifts between training and testing datasets, ensuring models achieve robust performance despite underlying differences in data-generating distributions.

Core Concepts and Methods

The concept of importance weighting is often interlinked with the problem of distribution shifts. One canonical example is the usage of density ratio estimation to correct biases incurred under covariate shift, a scenario where the marginal distributions of training and test input differ while the conditional distributions remain unchanged. The authors also review a variety of approaches for density ratio estimation to facilitate the acquisition of accurate importance weights, reflecting on the efficiency of methods like Kernel Mean Matching and Least-Squares Importance Fitting.

Applications in Distribution Shift

Importance weighting finds substantial applications in managing different distribution shifts:

  • Covariate Shift: It corrects the understatement of empirical risk due to discrepancies between training and testing input distributions.
  • Target Shift and Sample Selection Bias: Importance weighting is applied to account for variations in the distribution of target variables and ensures unbiased estimations under selection biases.
  • Subpopulation Shift and Feedback Shift: Techniques like uncertainty-aware mixup and feedback shift correction respectively solve related challenges by leveraging importance weighting for robustness against shifts.

Advanced Topics in Domain Adaptation and Robust Optimization

Domain adaptation, especially in the paradigm of multi-source and open-set scenarios, also benefits from importance weighting. By adjusting the sample weights from source to target domain, models become more generalizable under complex and novel conditions. Furthermore, Distributionally Robust Optimization (DRO) is discussed as an extension to tackle even more challenging scenarios by considering worst-case distributional shifts, offering a sophisticated approach:

  1. Domain Adaptation: Importance weighting is pivotal in transferring learning across different domains while mitigating shifts in data distributions.
  2. Distributionally Robust Optimization: The authors draw connections between DRO and importance weighting, showing how the latter is a simple yet effective technique to counteract adverse distribution shifts, thereby ensuring model robustness.

Constraints and Challenges

While advantages are highlighted, the authors also critically discuss the limitations of importance weighting, particularly in deep learning environments. Studies have noted the challenge of maintaining its efficacy throughout extensive iterations in over-parametrized models, hinting at the need for integrating additional methods like regularization or early stopping to preserve its benefits.

Future Directions and Implications

The paper underscores the potential of leveraging importance weighting across innovative frontiers such as LLMs and modern neural network architectures. With the advent of generalization research in LLMs, understanding the nuances of importance weighting can pave the way for creating frameworks that exhibit robust performance amidst substantial changes in input distribution. There is, therefore, an imperative for methodical investigations into its capacity to enhance training under these complex conditions.

Conclusion

In summary, the paper aptly characterizes importance weighting as a cornerstone for balancing varying distributions in machine learning. By reviewing its applications across domains and discussing the methodological constituents and future potential, Kimura and Hino present a well-rounded treatise that advances our understanding of this fundamental technique—a guiding framework for researchers exploring distributional challenges inherent in real-world data.

X Twitter Logo Streamline Icon: https://streamlinehq.com