Stability via resampling: statistical problems beyond the real line (2405.09511v2)
Abstract: Model averaging techniques based on resampling methods (such as bootstrapping or subsampling) have been utilized across many areas of statistics, often with the explicit goal of promoting stability in the resulting output. We provide a general, finite-sample theoretical result guaranteeing the stability of bagging when applied to algorithms that return outputs in a general space, so that the output is not necessarily a real-valued -- for example, an algorithm that estimates a vector of weights or a density function. We empirically assess the stability of bagging on synthetic and real-world data for a range of problem settings, including causal inference, nonparametric regression, and Bayesian model selection.
- Abadie, A. (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects, Journal of Economic Literature 59(2): 391–425.
- Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program, J. Amer. Statist. Assoc. 105(490): 493–505.
- Comparative politics and the synthetic control method, American Journal of Political Science 59(2): 495–510.
- The economic costs of conflict: A case study of the basque country, American economic review 93(1): 113–132.
- Barber, R. F. (2024). Hoeffding and Bernstein inequalities for weighted sums of exchangeable random variables, arXiv preprint arXiv:2404.06457 .
- Stability and generalization, The Journal of Machine Learning Research 2: 499–526.
- Breiman, L. (1996a). Bagging predictors, Machine learning 24(2): 123–140.
- Breiman, L. (1996b). Heuristics of instability and stabilization in model selection, The Annals of Statistics 24(6): 2350–2383.
- Bühlmann, P. (2014). Discussion of big Bayes stories and BayesBag, Statistical science 29(1): 91–94.
- Distribution-free inequalities for the deleted and holdout error estimates, IEEE Transactions on Information Theory 25(2): 202–207.
- Distribution-free performance bounds for potential function rules, IEEE Transactions on Information Theory 25(5): 601–604.
- Folland, G. B. (1999). Real analysis, Pure and Applied Mathematics (New York), second edn, John Wiley & Sons, Inc., New York. Modern techniques and their applications, A Wiley-Interscience Publication.
- Hayes, T. P. (2005). A large-deviation inequality for vector-valued martingales, Combinatorics, Probability and Computing .
- Reproducible model selection using bagged posteriors, Bayesian Analysis 18(1): 79–104.
- Black-box tests for algorithmic stability, Inf. Inference 12(4): Paper No. iaad039, 30. https://doi.org/10.1093/imaiai/iaad039
- Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization, Advances in Computational Mathematics 25(1): 161–193.
- Definitions, methods, and applications in interpretable machine learning, Proceedings of the National Academy of Sciences 116(44): 22071–22080.
- Learnability, stability and uniform convergence, The Journal of Machine Learning Research 11: 2635–2670.
- Bagging provides assumption-free stability, arXiv preprint arXiv:2301.12600 .
- Vaníček, P. (1969). Approximate spectral analysis by least-squares fit: Successive spectral analysis, Astrophysics and Space Science 4: 387–391.
- Sparse algorithms are not stable: A no-free-lunch theorem, IEEE transactions on pattern analysis and machine intelligence 34(1): 187–193.
- Yu, B. (2013). Stability, Bernoulli 19(4): 1484–1500.
- Veridical Data Science: The Practice of Responsible Data Analysis and Decision Making, MIT Press.
- Veridical data science, Proc. Natl. Acad. Sci 117(8): 3920–3929.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.