Classification Tree Pruning Under Covariate Shift (2305.04335v2)
Abstract: We consider the problem of \emph{pruning} a classification tree, that is, selecting a suitable subtree that balances bias and variance, in common situations with inhomogeneous training data. Namely, assuming access to mostly data from a distribution $P_{X, Y}$, but little data from a desired distribution $Q_{X, Y}$ with different $X$-marginals, we present the first efficient procedure for optimal pruning in such situations, when cross-validation and other penalized variants are grossly inadequate. Optimality is derived with respect to a notion of \emph{average discrepancy} $P_{X} \to Q_{X}$ (averaged over $X$ space) which significantly relaxes a recent notion -- termed \emph{transfer-exponent} -- shown to tightly capture the limits of classification under such a distribution shift. Our relaxed notion can be viewed as a measure of \emph{relative dimension} between distributions, as it relates to existing notions of information such as the Minkowski and Renyi dimensions.
- Predicting Forest Fire in Algeria Using Data Mining Techniques: Case Study of the Decision Tree Algorithm, pages 363–370. 02 2020.
- Fast learning rates for plug-in classifiers. The Annals of Statistics, 35(2):608–633, April 2007.
- Searching for exotic particles in high-energy physics with deep learning. Nature Communications, 5(1), July 2014.
- Impossibility theorems for domain adaptation. Journal of Machine Learning Research - Proceedings Track, 9:129–136, 01 2010.
- Dimitri P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Academic Press, 1982.
- Optimal dyadic decision trees. Machine Learning, 66:209–241, 03 2007.
- Oracle bounds and exact algorithm for dyadic classification trees. volume 3120, pages 378–392, 07 2004.
- Methods for multidimensional event classification: a case study using images from a cherenkov gamma-ray telescope. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 516(2):511–528, 2004.
- Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software., 1983.
- Transfer learning for nonparametric classification: Minimax rate and adaptive classifier. Annals of Statistics, 49(1):100–128, February 2021.
- Kenneth L Clarkson. Nearest-neighbor searching and metric space dimensions. Nearest-neighbor methods for learning and vision: theory and practice, pages 15–59, 2006.
- Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547–553, 2009. Smart Business Networks: Concepts and Empirical Evidence.
- Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA, 2006.
- A Probabilistic Theory of Pattern Recognition. Springer New York, 1996.
- UCI machine learning repository, 2017.
- Kenneth Falconer. Fractal Geometry: Mathematical Foundations and Applications. Wiley, 2014.
- On spatial adaptive estimation of nonparametric regression. Mathematical Methods of Statistics, 6, 01 1997.
- Covariate shift by kernel mean matching. In NIPS 2009, 2009.
- Coincidence of various dimensions associated with metrics and measures on metric spaces. Discrete & Continuous Dynamical Systems - A, 3:591, 1997.
- A Distribution-Free Theory of Nonparametric Regression. Springer series in statistics. Springer, 2002.
- On the value of target data in transfer learning. In NeurIPS, 2019.
- Algorithms for optimal dyadic decision trees. Machine learning, 80(1):85–107, 2010.
- Convergence of meta-learning with task-specific adaptation over partial parameters. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 11490–11500. Curran Associates, Inc., 2020.
- A random forest-based framework for crop mapping using temporal, spectral, textural and polarimetric observations. International Journal of Remote Sensing, 40(18):7221–7251, 2019.
- Marginal singularity, and the benefits of labels in covariate-shift. In Annals of Statistics, (To appear) 2021.
- Optimal pointwise adaptive methods in nonparametric estimation. The Annals of Statistics, 25(6):2512–2546, December 1997.
- Domain adaptation: Learning bounds and algorithms. In Proceedings of The 22nd Annual Conference on Learning Theory (COLT 2009), Montréal, Canada, 2009.
- Quantitative structure–activity relationship models for ready biodegradability of chemicals. Journal of Chemical Information and Modeling, 53(4):867–878, 2013. PMID: 23469921.
- Concentration inequalities for the missing mass and for histogram rule error. Journal of Machine Learning Research, pages 895–911, 2003.
- Colin McDiarmid. On the method of bounded differences, page 148–188. London Mathematical Society Lecture Note Series. Cambridge University Press, 1989.
- L. Olsen. Typical lq-dimensions of measures. Monatshefte für Mathematik, 146(2):143–157, September 2005.
- A new similarity measure for covariate shift with applications to nonparametric regression, 2022.
- Yakov B. Pesin. Dimension Theory in Dynamical Systems. University of Chicago Press, 1997.
- Adaptive transfer learning, 2021.
- C. Scott and R.D. Nowak. Minimax-optimal classification with dyadic decision trees. IEEE Transactions on Information Theory, 52(4):1335–1353, April 2006.
- Clayton Scott. A generalized neyman-pearson criterion for optimal domain adaptation. In Aurélien Garivier and Satyen Kale, editors, Proceedings of the 30th International Conference on Algorithmic Learning Theory, volume 98 of Proceedings of Machine Learning Research, pages 738–761. PMLR, 22–24 Mar 2019.
- Dyadic classification trees via structural risk minimization. In NIPS, 2002.
- Harmonic Analysis (PMS-43): Real-Variable Methods, Orthogonality, and Oscillatory Integrals. (PMS-43). Princeton University Press, 1993.
- Charles J. Stone. Optimal global rates of convergence for nonparametric regression. Ann. Statist., 10(4):1040–1053, 12 1982.
- Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8:985–1005, 05 2007.
- Direct importance estimation with model selection and its application to covariate shift adaptation, 01 2007.
- On subtrees of trees. Advances in Applied Mathematics, 34(1):138–155, 2005.
- Provable meta-learning of linear representations. In ICML, 2021.
- Larry Wasserman. All of Nonparametric Statistics. Springer New York, 2006.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.