Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nearest Neighbor Sampling for Covariate Shift Adaptation (2312.09969v2)

Published 15 Dec 2023 in stat.ML and cs.LG

Abstract: Many existing covariate shift adaptation methods estimate sample weights given to loss values to mitigate the gap between the source and the target distribution. However, estimating the optimal weights typically involves computationally expensive matrix inversion and hyper-parameter tuning. In this paper, we propose a new covariate shift adaptation method which avoids estimating the weights. The basic idea is to directly work on unlabeled target data, labeled according to the $k$-nearest neighbors in the source dataset. Our analysis reveals that setting $k = 1$ is an optimal choice. This property removes the necessity of tuning the only hyper-parameter $k$ and leads to a running time quasi-linear in the sample size. Our results include sharp rates of convergence for our estimator, with a tight control of the mean square error and explicit constants. In particular, the variance of our estimators has the same rate of convergence as for standard parametric estimation despite their non-parametric nature. The proposed estimator shares similarities with some matching-based treatment effect estimators used, e.g., in biostatistics, econometrics, and epidemiology. Our experiments show that it achieves drastic reduction in the running time with remarkable accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. An information-theoretical approach to semi-supervised learning under covariate-shift. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, Volume 151 of Proceedings of Machine Learning Research, pp.  7433–7449. PMLR.
  2. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517.
  3. Blumenson, L. (1960). A derivation of n-dimensional spherical coordinates. The American Mathematical Monthly 67(1), 63–66.
  4. Bogachev, V. I. and M. A. S. Ruas (2007). Measure theory, Volume 2. Springer Science & Business Media.
  5. Estimating and explaining model performance when both covariates and labels shift. In Advances in Neural Information Processing Systems, Volume 35, pp.  11467–11479. Curran Associates, Inc.
  6. Beating monte carlo integration: A nonasymptotic study of kernel smoothing methods. In International Conference on Artificial Intelligence and Statistics, pp.  548–556. PMLR.
  7. ADAPT : Awesome Domain Adaptation Python Toolbox. arXiv:2107.03049 [cs.LG].
  8. Integral approximation by kernel smoothing. Bernoulli 22(4), 2177–2208.
  9. UCI machine learning repository.
  10. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS) 3(3), 209–226.
  11. Covariate Shift by Kernel Mean Matching. In Dataset Shift in Machine Learning, pp.  131–160. The MIT Press.
  12. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems, Volume 19. MIT Press.
  13. A least-squares approach to direct importance estimation. Journal of Machine Learning Research 10(48), 1391–1445.
  14. Marginal singularity and the benefits of labels in covariate-shift. The Annals of Statistics 49(6), 3299–3323.
  15. Fastfood—approximating kernel expansions in loglinear time. In Proceedings of the 30th International Conference on Machine Learning, Volume 28.
  16. Lee, D.-H. et al. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML.
  17. Speeding up monte carlo integration: Control neighbors for optimal convergence. arXiv preprint arXiv:2305.06151.
  18. Mazaheri, B. (2020). Supplementary files for robustly correcting sampling bias using cumulative distribution functions. https://github.com/honeybijan/NeurIPS2020.
  19. Portier, F. (2021). Nearest neighbor process: weak convergence and non-asymptotic bound. arXiv preprint arXiv:2110.15083.
  20. Scott, D. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons.
  21. Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference 90(2), 227–244.
  22. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems 20, NIPS 2007, pp.  1433–1440.
  23. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics 60(4), 699–746.
  24. Truquet, L. (2011). On a nonparametric resampling scheme for markov random fields. Electronic Journal of Statistics 5, 1503–1536.
  25. Uhlmann, W. (1966). Vergleich der hypergeometrischen mit der binomial-verteilung. Metrika 10, 145–158.
  26. Van der Vaart, A. W. (2000). Asymptotic Statistics, Volume 3. Cambridge University Press.
  27. Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift. arXiv:2302.10160 [cs, math, stat].
  28. Williams, C. K. I. and M. Seeger (2000). Using the Nyström Method to Speed up Kernel Machines. In Advances in Neural Information Processing Systems, Volume 13 of NIPS 2000, pp.  661–667. MIT Press.
  29. Relative Density-Ratio Estimation for Robust Distribution Comparison. Neural Computation 25(5), 1324–1370.
  30. A One-Step Approach to Covariate Shift Adaptation. SN Computer Science 2(4), 319.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets